Packed Binary

packed-format-string?

2 Format Strings

Version: 4.2.1

Packed Binary

by Doug Williams

m.douglas.williams at gmail.com

This library performs conversions between PLT Scheme values and C structs represented as PLT Scheme byte strings. It also provides read and write routines to perform these conversions directly to/from binary files. It uses format strings (see Format Strings) as compact descriptions of the layout of the C structs and the intended conversion to/from PLT Scheme values. This can be used in handling binary data stored in files or from network connections, among other sources.

Everything in this library is exported by a single module:

(require (planet williams/packed-binary/packed-binary))

2 Format Strings

1 Interface

The library defines the following functions:

(packed-format-string? x) → boolean?

x : any/c

Determines whether a given value is a format string.

(calculate-size format) → (and/c integer? exact? (>=/c 0))

format : packed-format-string/c

Returns the size of the byte string corresponding to the given format.

(pack format v ...) → bytes?

format : packed-format-string/c

v : any/c

Returns a byte string containing the vs packed according to the given format. The vs must match the values required by the format exactly.

(pack-into format buffer offset v ...) → bytes?

format : packed-format-string/c

buffer : bytes?

offset : (and/c integer? exact? (>=/c 0))

v : any/c

Packs the vs according to the given format into the (mutable) byte string buffer starting at offset. Note that offset is not an optional argument.

(unpack format bytes) → (listof any/c)

format : packed-format-string/c

bytes : bytes?

Unpacks the bytes according to the given format. The result is a list, even if it contains exactly one item. The bytes must contain exactly the amount of data required by the format.

(unpack-from format buffer [offset]) → (listof any/c)

format : packed-format-string/c

buffer : bytes?

offset : (and/c integer? exact? (>=/c 0)) = 0

Unpack the byte string buffer starting at offset according to the given format. The result is a list, even if it contains exactly one item. The buffer must contain at least the amount of data required by the format.

(write-packed format port v ...) → any

format : packed-format-string/c

port : output-port?

v : any/c

Pack the vs according to the given format and write them to the specified output port. The vs must match the values required by the format exactly.

(read-packed format port) → (list-of any/c)

format : packed-format-string?

port : input-port?

Read (calculate-size format) bytes from the input port and unpack them according to the given format. The result is a list, even if it contains exactly one item.

2 Format Strings

A format string is a compact description of the layout of a C struct and the intended conversion to/from PLT Scheme values. The conversion between C and PLT Scheme values should be obvious given their types. The following table defines each of the format characters:

Character
C Type
PLT Scheme
x
pad byte
no value
c
char
char
b
signed char
integer
B
unsigned char
integer
h
short
integer
H
unsigned short
integer
i
int
integer
I
unsigned int
integer
l
long
integer
L
unsigned long
integer
q
long long
integer
Q
unsigned long long
integer
f
float
real
d
double
real
s
char[]
string

A format character may be preceded by an integral repeat count. For example, the format string "4h" means exactly the same as "hhhh".

Whitespace characters between formats are ignored. However, there must not be any whitespace between a count and its format.

For the "s" format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters. For example, "10s" means a 10-byte string value while "10c" means 10 character values. For packing, the string is truncated or padded with null bytes as appripriate to make it fit. For unpacking. the resulting string always has exactly the specified number of bytes. As a special case, "0s" means a single, empty string (while "0c" means 0 characters).

By default, C numbers are represented in the machine’s native format and byte order and properly aligned by skipping pad bytes if necessary.

Alternatively, the first character of the format string can be used to indicate the byte order, size, and alignment of the packed data according to the following table:

Character
Byte Order
Size and Alignment
@
native
native
=
native
standard
<
little endian
standard
>
big endian
standard
!
network (big endian)
standard

If the first character is not one of these, "@" is assumed.

Native byte order is big endian or little endian. For example, Motorola and Sun processors are big endian, while Intel and DEC processors are little endian.

Standard size and alignment are as follows:

no alignment is required for any type (so you have to use pad bytes)
short is 2 bytes
int and long are 4 bytes
long long is 8 bytes
float is a 32-bit IEEE floating point number
double is a 64-bit IEEE floating point number

Note the difference between "@" and "=": both use native byte order – but the size and alignment of the latter is standardized.

The form "!" is available for those who can’t remember whether network byte order is big endian or little endian – it is big endian.

There is no way to indicate non-native byte order (force byte swapping). Use the appropriate choice of "<" or ">".

Hint, to align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero. For example, the format "llh0l" specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries. This only works when native size and alignment are in effect – standard size and alignment does not enforce any alignment.

The current implementation may not properly handle native alignment in all cases. For the current implementation, the native alignment is assumed to be the same as the size. This may result in excess pad bytes, particularly for 8-byte objects.