#lang scribble/doc

@(require scribble/manual
          scribble/struct
          scribble/eval
          scribble/basic
          (for-label racket
                     (planet williams/packed-binary/packed-binary)))

@title[#:tag "packed-binary"]{Packed Binary}

M. Douglas Williams

@tt{m.douglas.williams at gmail.com}

This library performs conversions between Racket values and C structs represented as byte strings.  It also provides read and write routines to perform these conversions directly to/from binary files.  It uses @italic{format strings} (see @secref{format strings}) as compact descriptions of the layout of the C structs and the intended conversion to/from Racket values.  This can be used in handling binary data stored in files or from network connections, among other sources.

Everything in this library is exported by a single module:

@defmodule[(planet williams/packed-binary/packed-binary)]

@table-of-contents[]

@section[#:tag "interface"]{Interface}

The library defines the following functions:

@defproc[(packed-format-string? (x any/c)) boolean?]{
Determines whether a given value is a @italic{format string}.}

@defproc[(calculate-size (format packed-format-string/c))
         (and/c integer? exact? (>=/c 0))]{
Returns the size of the byte string corresponding to the given @scheme[format].}

@defproc[(pack (format packed-format-string/c) (v any/c) ...)
         bytes?]{
Returns a byte string containing the @scheme[v]s packed according to the given @scheme[format].  The @scheme[v]s must match the values required by the @scheme[format] exactly.}
           
@defproc[(pack-into (format packed-format-string/c) (buffer bytes?) (offset (and/c integer? exact? (>=/c 0))) (v any/c) ...)
         bytes?]{
Packs the @scheme[v]s according to the given @scheme[format] into the (mutable) byte string @scheme[buffer] starting at @scheme[offset].  Note that @scheme[offset] is not an optional argument.}
      
@defproc[(unpack (format packed-format-string/c) (bytes bytes?))
         (listof any/c)]{
Unpacks the @scheme[bytes] according to the given @scheme[format].  The result is a list, even if it contains exactly one item.  The @scheme[bytes] must contain exactly the amount of data required by the @scheme[format].}
                        
@defproc[(unpack-from (format packed-format-string/c) (buffer bytes?) (offset (and/c integer? exact? (>=/c 0)) 0))
         (listof any/c)]{
Unpack the byte string @scheme[buffer] starting at @scheme[offset] according to the given @scheme[format].  The result is a list, even if it contains exactly one item.  The @scheme[buffer] must contain at least the amount of data required by the @scheme[format].}
                        
@defproc[(write-packed (format packed-format-string/c) (port output-port?) (v any/c) ...)
         any]{
Pack the @scheme[v]s according to the given @scheme[format] and write them to the specified output @scheme[port].  The @scheme[v]s must match the values required by the @scheme[format] exactly.}
             
@defproc[(read-packed (format packed-format-string?) (port input-port?))
         (list-of any/c)]{
Read @scheme[(calculate-size format)] bytes from the input @scheme[port] and unpack them according to the given @scheme[format].  The result is a list, even if it contains exactly one item.}

@section[#:tag "format strings"]{Format Strings}

A @tech{format string} is a compact description of the layout of a C struct and the intended conversion to/from Racket values.  The conversion between C and Racket values should be obvious given their types.  The following table defines each of the format characters:

@(make-table
  'boxed
  (list
   (list (make-flow (list (make-paragraph (list @bold{Character}))))
         (make-flow (list (make-paragraph (list @bold{C Type}))))
         (make-flow (list (make-paragraph (list @bold{Racket})))))
   (list (make-flow (list (make-paragraph (list @tt{x}))))
         (make-flow (list @t{pad byte}))
         (make-flow (list @t{no value})))
   (list (make-flow (list (make-paragraph (list @tt{c}))))
         (make-flow (list (make-paragraph (list @tt{char}))))
         (make-flow (list @t{char})))
   (list (make-flow (list (make-paragraph (list @tt{b}))))
         (make-flow (list (make-paragraph (list @tt{signed char}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{B}))))
         (make-flow (list (make-paragraph (list @tt{unsigned char}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{h}))))
         (make-flow (list (make-paragraph (list @tt{short}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{H}))))
         (make-flow (list (make-paragraph (list @tt{unsigned short}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{i}))))
         (make-flow (list (make-paragraph (list @tt{int}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{I}))))
         (make-flow (list (make-paragraph (list @tt{unsigned int}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{l}))))
         (make-flow (list (make-paragraph (list @tt{long}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{L}))))
         (make-flow (list (make-paragraph (list @tt{unsigned long}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{q}))))
         (make-flow (list (make-paragraph (list @tt{long long}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{Q}))))
         (make-flow (list (make-paragraph (list @tt{unsigned long long}))))
         (make-flow (list @t{integer})))
   (list (make-flow (list (make-paragraph (list @tt{f}))))
         (make-flow (list (make-paragraph (list @tt{float}))))
         (make-flow (list @t{real})))
   (list (make-flow (list (make-paragraph (list @tt{d}))))
         (make-flow (list (make-paragraph (list @tt{double}))))
         (make-flow (list @t{real})))
   (list (make-flow (list (make-paragraph (list @tt{s}))))
         (make-flow (list (make-paragraph (list @tt{char[]}))))
         (make-flow (list @t{string})))
   ))

A format character may be preceded by an integral repeat count.  For example, the format string @scheme["4h"] means exactly the same as @scheme["hhhh"].

Whitespace characters between formats are ignored.  However, there must not be any whitespace between a count and its format.

For the @scheme["s"] format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters.  For example, @scheme["10s"] means a 10-byte string value while @scheme["10c"] means 10 character values.  For packing, the string is truncated or padded with null bytes as appripriate to make it fit.  For unpacking. the resulting string always has exactly the specified number of bytes.  As a special case, @scheme["0s"] means a single, empty string (while @scheme["0c"] means 0 characters).

By default, C numbers are represented in the machine's native format and byte order and properly aligned by skipping pad bytes if necessary.

Alternatively, the first character of the format string can be used to indicate the byte order, size, and alignment of the packed data according to the following table:

@(make-table
  'boxed
  (list
   (list (make-flow (list (make-paragraph (list @bold{Character}))))
         (make-flow (list (make-paragraph (list @bold{Byte Order}))))
         (make-flow (list (make-paragraph (list @bold{Size and Alignment})))))
   (list (make-flow (list (make-paragraph (list @tt|{@}|))))
         (make-flow (list @t{native}))
         (make-flow (list @t{native})))
   (list (make-flow (list (make-paragraph (list @tt{=}))))
         (make-flow (list @t{native}))
         (make-flow (list @t{standard})))
   (list (make-flow (list (make-paragraph (list @tt{<}))))
         (make-flow (list @t{little endian}))
         (make-flow (list @t{standard})))
   (list (make-flow (list (make-paragraph (list @tt{>}))))
         (make-flow (list @t{big endian}))
         (make-flow (list @t{standard})))
   (list (make-flow (list (make-paragraph (list @tt{!}))))
         (make-flow (list @t{network (big endian)}))
         (make-flow (list @t{standard})))
   ))

If the first character is not one of these, @scheme["@"] is assumed.

Native byte order is big endian or little endian. For example, Motorola and Sun processors are big endian, while Intel and DEC processors are little endian.

Standard size and alignment are as follows:
@itemize{
  @item{no alignment is required for any type (so you have to use pad bytes)}
  @item{@tt{short} is 2 bytes}
  @item{@tt{int} and @tt{long} are 4 bytes}
  @item{@tt{long long} is 8 bytes}
  @item{@tt{float} is a 32-bit IEEE floating point number}
  @item{@tt{double} is a 64-bit IEEE floating point number}
}

Note the difference between @scheme["@"] and @scheme["="]: both use native byte order---but the size and alignment of the latter is standardized.

The form @scheme["!"] is available for those who can't remember whether network byte order is big endian or little endian---it is big endian.

There is no way to indicate non-native byte order (force byte swapping).  Use the appropriate choice of @scheme["<"] or @scheme[">"].

Hint, to align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero. For example, the format @scheme["llh0l"] specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries.  This only works when native size and alignment are in effect---standard size and alignment does not enforce any alignment.

The current implementation may not properly handle native alignment in all cases. For the current implementation, the native alignment is assumed to be the same as the size. This may result in excess pad bytes, particularly for 8-byte objects.