Packed Binary
by Doug Williams
m.douglas.williams at gmail.com
This library performs conversions between PLT Scheme values and C structs represented as PLT Scheme byte strings. It also provides read and write routines to perform these conversions directly to/from binary files. It uses format strings (see Format Strings) as compact descriptions of the layout of the C structs and the intended conversion to/from PLT Scheme values. This can be used in handling binary data stored in files or from network connections, among other sources.
Everything in this library is exported by a single module:
(require (planet williams/packed-binary/packed-binary)) |
1 Interface
The library defines the following functions:
(packed-format-string? x) → boolean? |
x : any/c |
(pack format v ) → bytes? |
format : packed-format-string/c |
v : any/c |
(pack-into format buffer offset v ) → bytes? |
format : packed-format-string/c |
buffer : bytes? |
offset : (and/c integer? exact? (>=/c 0)) |
v : any/c |
(unpack format bytes) → (listof any/c) |
format : packed-format-string/c |
bytes : bytes? |
(unpack-from format buffer [offset]) → (listof any/c) |
format : packed-format-string/c |
buffer : bytes? |
offset : (and/c integer? exact? (>=/c 0)) = 0 |
(write-packed format port v ) → any |
format : packed-format-string/c |
port : output-port? |
v : any/c |
(read-packed format port) → (list-of any/c) |
format : packed-format-string? |
port : input-port? |
2 Format Strings
A format string is a compact description of the layout of a C struct and the intended conversion to/from PLT Scheme values. The conversion between C and PLT Scheme values should be obvious given their types. The following table defines each of the format characters:
Character | C Type | PLT Scheme |
x | pad byte | no value |
c | char | char |
b | signed char | integer |
B | unsigned char | integer |
h | short | integer |
H | unsigned short | integer |
i | int | integer |
I | unsigned int | integer |
l | long | integer |
L | unsigned long | integer |
q | long long | integer |
Q | unsigned long long | integer |
f | float | real |
d | double | real |
s | char[] | string |
A format character may be preceded by an integral repeat count. For example, the format string "4h" means exactly the same as "hhhh".
Whitespace characters between formats are ignored. However, there must not be any whitespace between a count and its format.
For the "s" format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters. For example, "10s" means a 10-byte string value while "10c" means 10 character values. For packing, the string is truncated or padded with null bytes as appripriate to make it fit. For unpacking. the resulting string always has exactly the specified number of bytes. As a special case, "0s" means a single, empty string (while "0c" means 0 characters).
By default, C numbers are represented in the machine’s native format and byte order and properly aligned by skipping pad bytes if necessary.
Alternatively, the first character of the format string can be used to indicate the byte order, size, and alignment of the packed data according to the following table:
Character | Byte Order | Size and Alignment |
@ | native | native |
= | native | standard |
< | little endian | standard |
> | big endian | standard |
! | network (big endian) | standard |
If the first character is not one of these, "@" is assumed.
Native byte order is big endian or little endian. For example, Motorola and Sun processors are big endian, while Intel and DEC processors are little endian.
no alignment is required for any type (so you have to use pad bytes)
short is 2 bytes
int and long are 4 bytes
long long is 8 bytes
float is a 32-bit IEEE floating point number
double is a 64-bit IEEE floating point number
Note the difference between "@" and "=": both use native byte order – but the size and alignment of the latter is standardized.
The form "!" is available for those who can’t remember whether network byte order is big endian or little endian – it is big endian.
There is no way to indicate non-native byte order (force byte swapping). Use the appropriate choice of "<" or ">".
Hint, to align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero. For example, the format "llh0l" specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries. This only works when native size and alignment are in effect – standard size and alignment does not enforce any alignment.
The current implementation may not properly handle native alignment in all cases. For the current implementation, the native alignment is assumed to be the same as the size. This may result in excess pad bytes, particularly for 8-byte objects.