racket-bitsyntax

racket-bitsyntax

1 Introduction

2 Changes

3 What is a bit string?

4 API

4.1 Pattern-matching bit strings

4.2 Assembling bit strings from pieces

4.3 Bit string utilities

4.4 Debugging utilities

Version: 5.1.1

Tony Garnock-Jones <tonygarnockjones@gmail.com>

1 Introduction

2 Changes

3 What is a bit string?

4 API

4.1 Pattern-matching bit strings

4.2 Assembling bit strings from pieces

4.3 Bit string utilities

4.4 Debugging utilities

If you find that this library lacks some feature you need, or you have a suggestion for improving it, please don’t hesitate to get in touch with me!

This library adds three features to Racket:

library support for bit strings, a generalization of byte vectors;
syntactic support for extracting integers, floats and sub-bit-strings from bit strings; and
syntactic support for constructing bit strings from integers, floats and other bit strings.

It is heavily inspired by Erlang’s binaries, bitstrings, and binary pattern-matching. The Erlang documentation provides a good introduction to these features:

Bit syntax expressions in the Erlang Reference Manual
Bit syntax in the Programming Examples Manual

Version 3.0 of this library uses :: instead of : to separate expressions from encoding specifications in the bit-string-case and bit-string macros. The reason for this is to avoid a collision with Typed Racket, which uses : for its own purposes.

A bit string is either

a byte vector, as returned by bytes and friends;
a bit-resolution slice of a byte vector, as returned by sub-bit-string; or
a splicing-together of two bit strings, as returned by bit-string-append.

The routines in this library are written, except where specified, to handle any of these three representations for bit strings.

If you need to flatten a bit string into a contiguous sequence of whole bytes, use bit-string->bytes or bit-string->bytes/align.

All the functionality below can be accessed with a single require:

(require (planet tonyg/bitsyntax:2:1))

(bit-string-case value-expr clause ...)

clause

([segment-pattern ...]

(when guard-expr)

body-expr ...)

([segment-pattern ...]

body-expr ...)

(else

body-expr ...)

segment-pattern

comparison-pattern

binding-pattern

discard-pattern

comparison-pattern

(= expr :: option ...)

(= expr)

binding-pattern

(id :: option ...)

(id)

discard-pattern

(:: option ...)

option

type-option

signedness-option

endianness-option

width-option

type-option

integer

float

binary

signedness-option

unsigned

signed

endianness-option

little-endian

big-endian

native-endian

width-option

bits n

bytes n

default

The value-expr is evaluated first. It must evaluate to a bit string—any object for which bit-string? would return #t.

Each clause is then tried in turn. The first succeeding clause determines the result of the whole expression. A clause matches successfully if all its segment-patterns match some portion of the input, there is no unused input left over at the end, and the guard-expr (if there is one) evaluates to a true value. If a clause succeeds, then (begin body-expr ...) is evaluated, and its result becomes the result of the whole expression.

If none of the clauses succeed, and there is an else clause, its body-exprs are evaluated and returned. If there’s no else clause and none of the others succeed, an error is signalled.

Each segment-pattern matches zero or more bits of the input bit string. The given type, signedness, endianness and width are used to extract a value from the bit string, at which point it is either compared to some other value using equal? (if a comparison-pattern was used in the segment-pattern), bound to a pattern variable (if a binding-pattern was used), or discarded (if a discard-pattern was used) before matching continues with the next segment-pattern.

The supported segment types are

integer – this is the default. A signed or unsigned, big- or little-endian integer of the given width in bits is read out of the bit string. Unless otherwise specified, integers default to big-endian, unsigned, and eight bits wide. Any width, not just multiples of eight, is supported.
float – A 32- or 64-bit float in either big- or little-endian byte order is read out of the bit string using floating-point-bytes->real. Unless otherwise specified, floats default to big-endian and 64 bits wide. Widths other than 32 or 64 bits are unsupported.
binary – A sub-bit-string is read out of the bit string. The bit string can be an arbitrary number of bits long, not just a multiple of eight. Unless otherwise specified, the entire rest of the input will be consumed and returned.

Each type has a default signedness, endianness, and width in bits, as described above. These can all be overridden individually:

unsigned and signed specify that integers should be decoded in an unsigned or signed manner, respectively.
big-endian, little-endian and native-endian specify the endianness to use in decoding integers or floats. Specifying native-endian causes Racket to use whatever is the native endianness of the platform the program is currently running on (discovered using system-big-endian?).
default causes the decoder to use whatever the default width is for the type specified.
bytes n causes the decoder to try to consume n bytes of input for this segment-pattern.
bits n causes the decoder to try to consume n bits of input for this segment-pattern.

For example:

(bit-string-case some-input-value
  ([(= 0 :: bytes 2)] 'a)
  ([(f :: bits 10) (:: binary)]
   (when (and (< f 123) (>= f 100)))
   'between-100-and-123)
  ([(f :: bits 10) (:: bits 6)]
   f)
  ([(f :: bits 10) (:: bits 6) (rest :: binary)]
   (list f rest)))

This expression analyses some-input-value, which must be a (bit-string?). It may contain:

16 zero bits, in which case the result is 'a; or
a ten-bit big-endian unsigned integer followed by 6 bits which are ignored, where the integer is between 100 (inclusive) and 123 (exclusive), in which case the result is 'between-100-and-123; or
the same as the previous clause, but without the guard; if this succeeds, the result is the ten-bit integer itself; or
the same as the previous clause, but with an arbitrary number of bits following the six discarded bits. The result here is a list containing the ten-bit integer and the trailing bit string.

The following code block parses a Pascal-style byte string (one length byte, followed by the right number of data bytes) and decodes it using a UTF-8 codec:

(bit-string-case input-bit-string
([len (body :: binary bytes len)]
(bytes->string/utf-8 (bit-string-pack body))))

Notice how the len value, which came from the input bit string itself, is used to decide how much of the remaining input to consume.

(bit-string spec ...)

spec	=	[segment-expr :: option ...]
	\|	segment-expr

option	=	type-option
	\|	endianness-option
	\|	width-option

type-option	=	integer
	\|	float
	\|	binary

endianness-option	=	little-endian
	\|	big-endian
	\|	native-endian

width-option	=	bits n
	\|	bytes n
	\|	default

This form assembles and encodes a collection of integer, floating-point numbers, and/or sub-bit-strings into a single bit string. Each of the zero or more specs supplies zero or more bits of the resulting bit string.

Each spec can specify an integer or floating-point number to encode, or a bit string to copy into the output. If a type is not specified, integer is assumed. If an endianness is (relevant but) not specified, big-endian is assumed. If a width is not given, integers are encoded as 8-bit quantities, floats are encoded as 64-bit quantities, and binary objects are copied into the output in their entirety.

If a width is specified, integers will be truncated or sign-extended to fit, and binaries will be truncated. If a binary is shorter than a specified width, an error is signalled. Floating-point encoding can only be done using 32- or 64-bit widths.

For example:

(define (string->pascal/utf-8 str)
(let ((bs (string->bytes/utf-8 str)))
(bit-string (bytes-length bs) [bs :: binary])))

This subroutine encodes its string argument using a UTF-8 codec, and then assembles it into a Pascal-style string with a prefix length byte. If the encoded string is longer than 255 bytes, note that the length byte will be truncated and so the encoding will be incorrect. A better encoder would ensure that bs was not longer than 255 bytes before encoding it as a Pascal string.

Note that if you wish to leave all the options at their defaults (that is, [... :: integer bits 8]), you can use the second form of spec given above.

(bit-string? x) → boolean?

x : any?

Returns #t if its argument is either a bytes?, a bit-slice? or a splice?. Returns #f otherwise.

(bit-string-length x) → integer?

x : bit-string?

Returns the length of its argument, in bits.

(bit-string-empty? x) → boolean?

x : bit-string?

Returns #t if its argument’s bit-string-length is zero.

(bit-string-append a ...) → bit-string?

a : bit-string?

Appends its arguments in order, producing a new bit string. Uses splice internally when it can’t arrange to return a bit string previously constructed. (The practical upshot of this is that you might need to use bit-string->bytes to "flatten" appended bit-strings from time to time.)

(bit-string-split-at x offset)

→

bit-string?

x : bit-string?

offset : integer?

Produces two values: the bit-string containing bits [0..offset) of x, and the bit-string containing bits [offset..(bit-string-length x)) of x. If offset is negative or greater-or-equal-to the number of bits in x, an error is signalled.

(bit-string-split-at-or-false x offset)

→

(or/c bit-string? #f)

x : bit-string?

offset : integer?

Like (bit-string-split-at x offset), but if offset is out of range returns (values #f #f) instead of signalling an error. This procedure is used in the implementation of bit-string-case.

(sub-bit-string x low-bit high-bit) → bit-string?

x : bit-string?

low-bit : integer?

high-bit : integer?

If (<= 0 low-bit high-bit (sub1 (bit-string-length x))), returns the bit-string containing bits [low-bit..high-bit) of x. Otherwise, signals an error.

(bit-string-ref x offset) → (or/c 0 1)

x : bit-string?

offset : integer?

Extracts bit number offset from x. Signals an error if offset is negative or greater-than-or-equal-to the length of x.

(bit-string->bytes x) → bytes?

x : bit-string?

Flattens any splices or bit-slices in x, producing a single contiguous byte vector with x’s contents. If (positive? (remainder (bit-string-length x) 8)), pads out the remaining bits with zeros on the right.

(bit-string->bytes/align x align-right?) → bytes?

x : bit-string?

align-right? : boolean?

As bit-string->bytes, but offers the choice of padding on the right (if align-right? is #f) or on the left (if align-right? is #t) when padding is required. (Note that to align the bits in x on the right is to pad with zeros on the left, and vice versa.)

(bit-string-byte-count x) → integer?

x : bit-string?

Returns the smallest number of whole bytes that could contain all the bits in x.

(bit-string-pack! x buf offset) → void?

x : bit-string?

buf : bytes?

offset : integer?

Copies the entirety of x into buf, overwriting bits starting with the offsetth. It is an error for buf not to have enough room or for offset to be out-of-bounds.

(bit-string-pack x) → bit-string?

x : bit-string?

Returns a bit string equivalent to x (i.e. with exactly the same bits in the same order) but with any intermediate splices or bit-slices flattened away. The result will either be a bytes? of the correct length, or a bit-slice referring to a section of a byte vector of length (bit-string-byte-count x).

target : bit-string?

target-offset : integer?

source : bit-string?

source-offset : integer?

count : integer?

Overwrites bits [target-offset..(+ target-offset count)) of target with bits [source-offset..(+ source-offset count)) of source. Undefined behaviour results when (eq? target source).

(bit-string->integer x big-endian? signed?) → integer?

x : bit-string?

big-endian? : boolean?

signed? : boolean?

Interprets the bits in x as an integer, using either a big- or little-endian byte-ordering convention (per big-endian?), and either unsigned or two’s-complement signed arithmetic (per signed?) to produce the result.

(integer->bit-string n width big-endian?) → bit-string?

n : integer?

width : integer?

big-endian? : boolean?

Encodes n as a bit string of length width bits, truncating or sign-extending as required, and using a big- or little-endian byte-ordering convention as per big-endian?.

(copy-bits!

target-offset

source-offset

These procedures may be useful for debugging, but should not be relied upon otherwise.

(bit-slice? x) → boolean?

x : any?

Returns #t if and only if x is a bit-slice.

(bit-slice-binary x) → bytes?

x : bit-slice?

Extracts the underlying byte vector from a bit-slice.

(bit-slice-low-bit x) → integer?

x : bit-slice?

(bit-slice-high-bit x) → integer?

x : bit-slice?

Extract the low (inclusive) and high (exclusive) bit indexes, respectively, from a bit-slice. The bit-slice itself represents bits [low..high) of the underlying byte vector.

(splice? x) → boolean?

x : any?

Returns #t if and only if x is a splice of two bit-strings.

(splice-left x) → bit-string?

x : splice?

(splice-right x) → bit-string?

x : splice?

Extract the left and right bit-strings, respectively, that are spliced together by the given splice x.