1 Introduction
2 What is a bit string?
3 API
3.1 Pattern-matching bit strings
bit-string-case
3.2 Assembling bit strings from pieces
bit-string
3.3 Bit string utilities
bit-string?
bit-string-length
bit-string-empty?
bit-string-append
bit-string-split-at
bit-string-split-at-or-false
sub-bit-string
bit-string-ref
bit-string->bytes
bit-string->bytes/ align
bit-string-byte-count
bit-string-pack!
bit-string-pack
copy-bits!
bit-string->integer
integer->bit-string
3.4 Debugging utilities
bit-slice?
bit-slice-binary
bit-slice-low-bit
bit-slice-high-bit
splice?
splice-left
splice-right
Version: 5.1.1

racket-bitsyntax

Tony Garnock-Jones <tonygarnockjones@gmail.com>

    1 Introduction

    2 What is a bit string?

    3 API

      3.1 Pattern-matching bit strings

      3.2 Assembling bit strings from pieces

      3.3 Bit string utilities

      3.4 Debugging utilities

If you find that this library lacks some feature you need, or you have a suggestion for improving it, please don’t hesitate to get in touch with me!

1 Introduction

This library adds three features to Racket:

It is heavily inspired by Erlang’s binaries, bitstrings, and binary pattern-matching. The Erlang documentation provides a good introduction to these features:

2 What is a bit string?

A bit string is either

The routines in this library are written, except where specified, to handle any of these three representations for bit strings.

If you need to flatten a bit string into a contiguous sequence of whole bytes, use bit-string->bytes or bit-string->bytes/align.

3 API

All the functionality below can be accessed with a single require:

 (require (planet tonyg/bitsyntax:2:0))

3.1 Pattern-matching bit strings

(bit-string-case value-expr clause ...)
 
clause = 
([segment-pattern ...]
 (when guard-expr)
 body-expr ...)
  | 
([segment-pattern ...]
 body-expr ...)
  | 
(else
 body-expr ...)
     
segment-pattern = comparison-pattern
  | binding-pattern
  | discard-pattern
     
comparison-pattern = (= expr : option ...)
  | (= expr)
     
binding-pattern = (id : option ...)
  | (id)
  | id
     
discard-pattern = (: option ...)
     
option = type-option
  | signedness-option
  | endianness-option
  | width-option
     
type-option = integer
  | float
  | binary
     
signedness-option = unsigned
  | signed
     
endianness-option = little-endian
  | big-endian
  | native-endian
     
width-option = bits n
  | bytes n
  | default
The value-expr is evaluated first. It must evaluate to a bit string—any object for which bit-string? would return #t.

Each clause is then tried in turn. The first succeeding clause determines the result of the whole expression. A clause matches successfully if all its segment-patterns match some portion of the input, there is no unused input left over at the end, and the guard-expr (if there is one) evaluates to a true value. If a clause succeeds, then (begin body-expr ...) is evaluated, and its result becomes the result of the whole expression.

If none of the clauses succeed, and there is an else clause, its body-exprs are evaluated and returned. If there’s no else clause and none of the others succeed, an error is signalled.

Each segment-pattern matches zero or more bits of the input bit string. The given type, signedness, endianness and width are used to extract a value from the bit string, at which point it is either compared to some other value using equal? (if a comparison-pattern was used in the segment-pattern), bound to a pattern variable (if a binding-pattern was used), or discarded (if a discard-pattern was used) before matching continues with the next segment-pattern.

The supported segment types are

Each type has a default signedness, endianness, and width in bits, as described above. These can all be overridden individually:

For example:

(bit-string-case some-input-value
  ([(= 0 : bytes 2)] 'a)
  ([(f : bits 10) (: binary)]
   (when (and (< f 123) (>= f 100)))
   'between-100-and-123)
  ([(f : bits 10) (: bits 6)]
   f)
  ([(f : bits 10) (: bits 6) (rest : binary)]
   (list f rest)))

This expression analyses some-input-value, which must be a (bit-string?). It may contain:

The following code block parses a Pascal-style byte string (one length byte, followed by the right number of data bytes) and decodes it using a UTF-8 codec:

(bit-string-case input-bit-string
  ([len (body : binary bytes len)]
   (bytes->string/utf-8 (bit-string-pack body))))

Notice how the len value, which came from the input bit string itself, is used to decide how much of the remaining input to consume.

3.2 Assembling bit strings from pieces

(bit-string spec ...)
 
spec = [segment-expr : option ...]
  | segment-expr
     
option = type-option
  | endianness-option
  | width-option
     
type-option = integer
  | float
  | binary
     
endianness-option = little-endian
  | big-endian
  | native-endian
     
width-option = bits n
  | bytes n
  | default
This form assembles and encodes a collection of integer, floating-point numbers, and/or sub-bit-strings into a single bit string. Each of the zero or more specs supplies zero or more bits of the resulting bit string.

Each spec can specify an integer or floating-point number to encode, or a bit string to copy into the output. If a type is not specified, integer is assumed. If an endianness is (relevant but) not specified, big-endian is assumed. If a width is not given, integers are encoded as 8-bit quantities, floats are encoded as 64-bit quantities, and binary objects are copied into the output in their entirety.

If a width is specified, integers will be truncated or sign-extended to fit, and binaries will be truncated. If a binary is shorter than a specified width, an error is signalled. Floating-point encoding can only be done using 32- or 64-bit widths.

For example:

(define (string->pascal/utf-8 str)
  (let ((bs (string->bytes/utf-8 str)))
    (bit-string (bytes-length bs) [bs : binary])))

This subroutine encodes its string argument using a UTF-8 codec, and then assembles it into a Pascal-style string with a prefix length byte. If the encoded string is longer than 255 bytes, note that the length byte will be truncated and so the encoding will be incorrect. A better encoder would ensure that bs was not longer than 255 bytes before encoding it as a Pascal string.

Note that if you wish to leave all the options at their defaults (that is, [... : integer bits 8]), you can use the second form of spec given above.

3.3 Bit string utilities

(bit-string? x)  boolean?
  x : any?
Returns #t if its argument is either a bytes?, a bit-slice? or a splice?. Returns #f otherwise.

(bit-string-length x)  integer?
  x : bit-string?
Returns the length of its argument, in bits.

(bit-string-empty? x)  boolean?
  x : bit-string?
Returns #t if its argument’s bit-string-length is zero.

(bit-string-append a b)  bit-string?
  a : bit-string?
  b : bit-string?
Appends its arguments, producing a new bit string. Uses splice internally when it can’t arrange to return a bit string previously constructed. (The practical upshot of this is that you might need to use bit-string->bytes to "flatten" appended bit-strings from time to time.)

(bit-string-split-at x offset)  
bit-string? bit-string?
  x : bit-string?
  offset : integer?
Produces two values: the bit-string containing bits [0..offset) of x, and the bit-string containing bits [offset..(bit-string-length x)) of x. If offset is negative or greater-or-equal-to the number of bits in x, an error is signalled.

(bit-string-split-at-or-false x offset)  
(or/c bit-string? #f)
(or/c bit-string? #f)
  x : bit-string?
  offset : integer?
Like (bit-string-split-at x offset), but if offset is out of range returns (values #f #f) instead of signalling an error. This procedure is used in the implementation of bit-string-case.

(sub-bit-string x low-bit high-bit)  bit-string?
  x : bit-string?
  low-bit : integer?
  high-bit : integer?
If (<= 0 low-bit high-bit (sub1 (bit-string-length x))), returns the bit-string containing bits [low-bit..high-bit) of x. Otherwise, signals an error.

(bit-string-ref x offset)  (or/c 0 1)
  x : bit-string?
  offset : integer?
Extracts bit number offset from x. Signals an error if offset is negative or greater-than-or-equal-to the length of x.

(bit-string->bytes x)  bytes?
  x : bit-string?
Flattens any splices or bit-slices in x, producing a single contiguous byte vector with x’s contents. If (positive? (remainder (bit-string-length x) 8)), pads out the remaining bits with zeros on the right.

(bit-string->bytes/align x align-right?)  bytes?
  x : bit-string?
  align-right? : boolean?
As bit-string->bytes, but offers the choice of padding on the right (if align-right? is #f) or on the left (if align-right? is #t) when padding is required. (Note that to align the bits in x on the right is to pad with zeros on the left, and vice versa.)

(bit-string-byte-count x)  integer?
  x : bit-string?
Returns the smallest number of whole bytes that could contain all the bits in x.

(bit-string-pack! x buf offset)  void?
  x : bit-string?
  buf : bytes?
  offset : integer?
Copies the entirety of x into buf, overwriting bits starting with the offsetth. It is an error for buf not to have enough room or for offset to be out-of-bounds.

(bit-string-pack x)  bit-string?
  x : bit-string?
Returns a bit string equivalent to x (i.e. with exactly the same bits in the same order) but with any intermediate splices or bit-slices flattened away. The result will either be a bytes? of the correct length, or a bit-slice referring to a section of a byte vector of length (bit-string-byte-count x).

(copy-bits! target    
  target-offset    
  source    
  source-offset    
  count)  void?
  target : bit-string?
  target-offset : integer?
  source : bit-string?
  source-offset : integer?
  count : integer?
Overwrites bits [target-offset..(+ target-offset count)) of target with bits [source-offset..(+ source-offset count)) of source. Undefined behaviour results when (eq? target source).

(bit-string->integer x big-endian? signed?)  integer?
  x : bit-string?
  big-endian? : boolean?
  signed? : boolean?
Interprets the bits in x as an integer, using either a big- or little-endian byte-ordering convention (per big-endian?), and either unsigned or two’s-complement signed arithmetic (per signed?) to produce the result.

(integer->bit-string n width big-endian?)  bit-string?
  n : integer?
  width : integer?
  big-endian? : boolean?
Encodes n as a bit string of length width bits, truncating or sign-extending as required, and using a big- or little-endian byte-ordering convention as per big-endian?.

3.4 Debugging utilities

These procedures may be useful for debugging, but should not be relied upon otherwise.

(bit-slice? x)  boolean?
  x : any?
Returns #t if and only if x is a bit-slice.

(bit-slice-binary x)  bytes?
  x : bit-slice?
Extracts the underlying byte vector from a bit-slice.

(bit-slice-low-bit x)  integer?
  x : bit-slice?
(bit-slice-high-bit x)  integer?
  x : bit-slice?
Extract the low (inclusive) and high (exclusive) bit indexes, respectively, from a bit-slice. The bit-slice itself represents bits [low..high) of the underlying byte vector.

(splice? x)  boolean?
  x : any?
Returns #t if and only if x is a splice of two bit-strings.

(splice-left x)  bit-string?
  x : splice?
(splice-right x)  bit-string?
  x : splice?
Extract the left and right bit-strings, respectively, that are spliced together by the given splice x.