racket-bitsyntax
If you find that this library lacks some feature you need, or you have a suggestion for improving it, please don’t hesitate to get in touch with me!
1 Introduction
This library adds three features to Racket:
- library support for bit strings, a generalization of byte vectors; 
- syntactic support for extracting integers, floats and sub-bit-strings from bit strings; and 
- syntactic support for constructing bit strings from integers, floats and other bit strings. 
It is heavily inspired by Erlang’s binaries, bitstrings, and binary pattern-matching. The Erlang documentation provides a good introduction to these features:
- Bit syntax expressions in the Erlang Reference Manual 
- Bit syntax in the Programming Examples Manual 
2 Changes
Version 3.0 of this library uses :: instead of : to separate expressions from encoding specifications in the bit-string-case and bit-string macros. The reason for this is to avoid a collision with Typed Racket, which uses : for its own purposes.
3 What is a bit string?
A bit string is either
- a byte vector, as returned by bytes and friends; 
- a bit-resolution slice of a byte vector, as returned by sub-bit-string; or 
- a splicing-together of two bit strings, as returned by bit-string-append. 
The routines in this library are written, except where specified, to handle any of these three representations for bit strings.
If you need to flatten a bit string into a contiguous sequence of whole bytes, use bit-string->bytes or bit-string->bytes/align.
4 API
All the functionality below can be accessed with a single require:
| (require (planet tonyg/bitsyntax:2:1)) | 
4.1 Pattern-matching bit strings
| (bit-string-case value-expr clause ...) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 
 | 
Each clause is then tried in turn. The first succeeding clause determines the result of the whole expression. A clause matches successfully if all its segment-patterns match some portion of the input, there is no unused input left over at the end, and the guard-expr (if there is one) evaluates to a true value. If a clause succeeds, then (begin body-expr ...) is evaluated, and its result becomes the result of the whole expression.
If none of the clauses succeed, and there is an else clause, its body-exprs are evaluated and returned. If there’s no else clause and none of the others succeed, an error is signalled.
Each segment-pattern matches zero or more bits of the input bit string. The given type, signedness, endianness and width are used to extract a value from the bit string, at which point it is either compared to some other value using equal? (if a comparison-pattern was used in the segment-pattern), bound to a pattern variable (if a binding-pattern was used), or discarded (if a discard-pattern was used) before matching continues with the next segment-pattern.
The supported segment types are
- integer – this is the default. A signed or unsigned, big- or little-endian integer of the given width in bits is read out of the bit string. Unless otherwise specified, integers default to big-endian, unsigned, and eight bits wide. Any width, not just multiples of eight, is supported. 
- float – A 32- or 64-bit float in either big- or little-endian byte order is read out of the bit string using floating-point-bytes->real. Unless otherwise specified, floats default to big-endian and 64 bits wide. Widths other than 32 or 64 bits are unsupported. 
- binary – A sub-bit-string is read out of the bit string. The bit string can be an arbitrary number of bits long, not just a multiple of eight. Unless otherwise specified, the entire rest of the input will be consumed and returned. 
Each type has a default signedness, endianness, and width in bits, as described above. These can all be overridden individually:
- unsigned and signed specify that integers should be decoded in an unsigned or signed manner, respectively. 
- big-endian, little-endian and native-endian specify the endianness to use in decoding integers or floats. Specifying native-endian causes Racket to use whatever is the native endianness of the platform the program is currently running on (discovered using system-big-endian?). 
- default causes the decoder to use whatever the default width is for the type specified. 
- bytes n causes the decoder to try to consume n bytes of input for this segment-pattern. 
- bits n causes the decoder to try to consume n bits of input for this segment-pattern. 
For example:
(bit-string-case some-input-value ([(= 0 :: bytes 2)] 'a) ([(f :: bits 10) (:: binary)] (when (and (< f 123) (>= f 100))) 'between-100-and-123) ([(f :: bits 10) (:: bits 6)] f) ([(f :: bits 10) (:: bits 6) (rest :: binary)] (list f rest))) 
This expression analyses some-input-value, which must be a (bit-string?). It may contain:
- 16 zero bits, in which case the result is 'a; or 
- a ten-bit big-endian unsigned integer followed by 6 bits which are ignored, where the integer is between 100 (inclusive) and 123 (exclusive), in which case the result is 'between-100-and-123; or 
- the same as the previous clause, but without the guard; if this succeeds, the result is the ten-bit integer itself; or 
- the same as the previous clause, but with an arbitrary number of bits following the six discarded bits. The result here is a list containing the ten-bit integer and the trailing bit string. 
The following code block parses a Pascal-style byte string (one length byte, followed by the right number of data bytes) and decodes it using a UTF-8 codec:
(bit-string-case input-bit-string ([len (body :: binary bytes len)] (bytes->string/utf-8 (bit-string-pack body)))) 
Notice how the len value, which came from the input bit string itself, is used to decide how much of the remaining input to consume.
4.2 Assembling bit strings from pieces
| (bit-string spec ...) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 
 | 
Each spec can specify an integer or floating-point number to encode, or a bit string to copy into the output. If a type is not specified, integer is assumed. If an endianness is (relevant but) not specified, big-endian is assumed. If a width is not given, integers are encoded as 8-bit quantities, floats are encoded as 64-bit quantities, and binary objects are copied into the output in their entirety.
If a width is specified, integers will be truncated or sign-extended to fit, and binaries will be truncated. If a binary is shorter than a specified width, an error is signalled. Floating-point encoding can only be done using 32- or 64-bit widths.
For example:
(define (string->pascal/utf-8 str) (let ((bs (string->bytes/utf-8 str))) (bit-string (bytes-length bs) [bs :: binary]))) 
This subroutine encodes its string argument using a UTF-8 codec, and then assembles it into a Pascal-style string with a prefix length byte. If the encoded string is longer than 255 bytes, note that the length byte will be truncated and so the encoding will be incorrect. A better encoder would ensure that bs was not longer than 255 bytes before encoding it as a Pascal string.
Note that if you wish to leave all the options at their defaults (that is, [... :: integer bits 8]), you can use the second form of spec given above.
4.3 Bit string utilities
| (bit-string? x) → boolean? | 
| x : any? | 
| (bit-string-length x) → integer? | 
| x : bit-string? | 
| (bit-string-empty? x) → boolean? | 
| x : bit-string? | 
| (bit-string-append a ...) → bit-string? | 
| a : bit-string? | 
| 
 | ||||||||
| x : bit-string? | ||||||||
| offset : integer? | 
| 
 | |||||||
| x : bit-string? | |||||||
| offset : integer? | 
| (sub-bit-string x low-bit high-bit) → bit-string? | 
| x : bit-string? | 
| low-bit : integer? | 
| high-bit : integer? | 
| (bit-string-ref x offset) → (or/c 0 1) | 
| x : bit-string? | 
| offset : integer? | 
| (bit-string->bytes x) → bytes? | 
| x : bit-string? | 
| (bit-string->bytes/align x align-right?) → bytes? | 
| x : bit-string? | 
| align-right? : boolean? | 
| (bit-string-byte-count x) → integer? | 
| x : bit-string? | 
| (bit-string-pack! x buf offset) → void? | 
| x : bit-string? | 
| buf : bytes? | 
| offset : integer? | 
| (bit-string-pack x) → bit-string? | 
| x : bit-string? | 
| 
 | |||||||||||||||||||||||||||||||||||
| target : bit-string? | |||||||||||||||||||||||||||||||||||
| target-offset : integer? | |||||||||||||||||||||||||||||||||||
| source : bit-string? | |||||||||||||||||||||||||||||||||||
| source-offset : integer? | |||||||||||||||||||||||||||||||||||
| count : integer? | 
| (bit-string->integer x big-endian? signed?) → integer? | 
| x : bit-string? | 
| big-endian? : boolean? | 
| signed? : boolean? | 
| (integer->bit-string n width big-endian?) → bit-string? | 
| n : integer? | 
| width : integer? | 
| big-endian? : boolean? | 
4.4 Debugging utilities
These procedures may be useful for debugging, but should not be relied upon otherwise.
| (bit-slice? x) → boolean? | 
| x : any? | 
| (bit-slice-binary x) → bytes? | 
| x : bit-slice? | 
| (bit-slice-low-bit x) → integer? | 
| x : bit-slice? | 
| (bit-slice-high-bit x) → integer? | 
| x : bit-slice? | 
| (splice-left x) → bit-string? | 
| x : splice? | 
| (splice-right x) → bit-string? | 
| x : splice? |