If you find that this library lacks some feature you need, or you have a suggestion for improving it, please don’t hesitate to get in touch with me!
This library adds three features to Racket:
library support for bit strings, a generalization of byte vectors;
syntactic support for extracting integers, floats and sub-bit-strings from bit strings; and
syntactic support for constructing bit strings from integers, floats and other bit strings.
It is heavily inspired by Erlang’s binaries, bitstrings, and binary pattern-matching. The Erlang documentation provides a good introduction to these features:
Version 3.0 of this library uses instead of : to separate expressions from encoding specifications in the bit-string-case and bit-string macros. The reason for this is to avoid a collision with Typed Racket, which uses : for its own purposes.
A bit string is either
a byte vector, as returned by bytes and friends;
a bit-resolution slice of a byte vector, as returned by sub-bit-string; or
a splicing-together of two bit strings, as returned by bit-string-append.
The routines in this library are written, except where specified, to handle any of these three representations for bit strings.
All the functionality below can be accessed with a single require:
|(require (planet tonyg/bitsyntax:2:1))|
|(bit-string-case value-expr clause ...)|
Each clause is then tried in turn. The first succeeding clause determines the result of the whole expression. A clause matches successfully if all its segment-patterns match some portion of the input, there is no unused input left over at the end, and the guard-expr (if there is one) evaluates to a true value. If a clause succeeds, then (begin body-expr ...) is evaluated, and its result becomes the result of the whole expression.
Each segment-pattern matches zero or more bits of the input bit string. The given type, signedness, endianness and width are used to extract a value from the bit string, at which point it is either compared to some other value using equal? (if a comparison-pattern was used in the segment-pattern), bound to a pattern variable (if a binding-pattern was used), or discarded (if a discard-pattern was used) before matching continues with the next segment-pattern.
The supported segment types are
integer – this is the default. A signed or unsigned, big- or little-endian integer of the given width in bits is read out of the bit string. Unless otherwise specified, integers default to big-endian, unsigned, and eight bits wide. Any width, not just multiples of eight, is supported.
float – A 32- or 64-bit float in either big- or little-endian byte order is read out of the bit string using floating-point-bytes->real. Unless otherwise specified, floats default to big-endian and 64 bits wide. Widths other than 32 or 64 bits are unsupported.
binary – A sub-bit-string is read out of the bit string. The bit string can be an arbitrary number of bits long, not just a multiple of eight. Unless otherwise specified, the entire rest of the input will be consumed and returned.
Each type has a default signedness, endianness, and width in bits, as described above. These can all be overridden individually:
unsigned and signed specify that integers should be decoded in an unsigned or signed manner, respectively.
big-endian, little-endian and native-endian specify the endianness to use in decoding integers or floats. Specifying native-endian causes Racket to use whatever is the native endianness of the platform the program is currently running on (discovered using system-big-endian?).
default causes the decoder to use whatever the default width is for the type specified.
bytes n causes the decoder to try to consume n bytes of input for this segment-pattern.
bits n causes the decoder to try to consume n bits of input for this segment-pattern.
(bit-string-case some-input-value ([(= 0 bytes 2)] 'a) ([(f bits 10) ( binary)] (when (and (< f 123) (>= f 100))) 'between-100-and-123) ([(f bits 10) ( bits 6)] f) ([(f bits 10) ( bits 6) (rest binary)] (list f rest)))
This expression analyses some-input-value, which must be a (bit-string?). It may contain:
16 zero bits, in which case the result is 'a; or
a ten-bit big-endian unsigned integer followed by 6 bits which are ignored, where the integer is between 100 (inclusive) and 123 (exclusive), in which case the result is 'between-100-and-123; or
the same as the previous clause, but without the guard; if this succeeds, the result is the ten-bit integer itself; or
the same as the previous clause, but with an arbitrary number of bits following the six discarded bits. The result here is a list containing the ten-bit integer and the trailing bit string.
The following code block parses a Pascal-style byte string (one length byte, followed by the right number of data bytes) and decodes it using a UTF-8 codec:
(bit-string-case input-bit-string ([len (body binary bytes len)] (bytes->string/utf-8 (bit-string-pack body))))
Notice how the len value, which came from the input bit string itself, is used to decide how much of the remaining input to consume.
|(bit-string spec ...)|
Each spec can specify an integer or floating-point number to encode, or a bit string to copy into the output. If a type is not specified, integer is assumed. If an endianness is (relevant but) not specified, big-endian is assumed. If a width is not given, integers are encoded as 8-bit quantities, floats are encoded as 64-bit quantities, and binary objects are copied into the output in their entirety.
If a width is specified, integers will be truncated or sign-extended to fit, and binaries will be truncated. If a binary is shorter than a specified width, an error is signalled. Floating-point encoding can only be done using 32- or 64-bit widths.
(define (string->pascal/utf-8 str) (let ((bs (string->bytes/utf-8 str))) (bit-string (bytes-length bs) [bs binary])))
This subroutine encodes its string argument using a UTF-8 codec, and then assembles it into a Pascal-style string with a prefix length byte. If the encoded string is longer than 255 bytes, note that the length byte will be truncated and so the encoding will be incorrect. A better encoder would ensure that bs was not longer than 255 bytes before encoding it as a Pascal string.
Note that if you wish to leave all the options at their defaults (that is, [... integer bits 8]), you can use the second form of spec given above.
|x : bit-string?|
|offset : integer?|
|(sub-bit-string x low-bit high-bit) → bit-string?|
|x : bit-string?|
|low-bit : integer?|
|high-bit : integer?|
|target : bit-string?|
|target-offset : integer?|
|source : bit-string?|
|source-offset : integer?|
|count : integer?|
|(bit-string->integer x big-endian? signed?) → integer?|
|x : bit-string?|
|big-endian? : boolean?|
|signed? : boolean?|
|(integer->bit-string n width big-endian?) → bit-string?|
|n : integer?|
|width : integer?|
|big-endian? : boolean?|
These procedures may be useful for debugging, but should not be relied upon otherwise.