#lang scribble/doc @(require scribble/manual scribble/struct scribble/eval (planet williams/uuid/uuid) (for-label scheme scheme/date (planet williams/uuid/uuid))) @title[#:tag "top"]{Universally Unique Identifiers (UUIDs)} by Doug Williams @tt{m.douglas.williams at gmail.com} This library provides Universally Unique Identifiers (UUIDs) as defined in RFC 4122. A UUID is a 128-bit value that is externally encoded as a string in 8-4-4-4-12 format. This library provides functions for constructing time-based (type 1), name-based using MD5 hashing (type 3), (pseudo-)random (type 4), and name-based using SHA-1 hashing (type 5) UUIDs. A copy of RFC 4122 is included with this library. An example of a time-based (type 1) UUID is: f81d4fae-7dec-11d0-a765-00a0c91e6bf6 which was generated on Monday, February 3, 1997 at 5:43:12 PM GMT on a machine with an IEEE 802 MAC address of 00-a0-c9-1e-6b-f6. The UUID library is available from the PLaneT repository. @defmodule[(planet williams/uuid/uuid)] @table-of-contents[] @section[#:tag "interface"]{Interface} The UUID library provides the following functions: @defproc[(uuid? (x any/c)) boolean?]{ Returns @scheme[#t] if @scheme[x] is a UUID.} @defproc[(uuid-RFC-4122? (uuid uuid?)) boolean?]{ Returns @scheme[#t] if @scheme[uuid] is a UUID of the variant defined by RFC 4122. Note that the functions in this library only create and manipulate this variant of UUID.} @defproc[(uuid-version (uuid uuid-RFC-4122?)) exact-nonnegative-integer?]{ Returns the version (i.e., type, or more accurately, sub-type) of this UUID. The current known versions are: @(make-table 'boxed (list (list (make-flow (list (make-paragraph (list @bold{Version})))) (make-flow (list (make-paragraph (list @bold{Description}))))) (list (make-flow (list @t{1})) (make-flow (list @t{The time-based version specified in RFC 4122.}))) (list (make-flow (list @t{2})) (make-flow (list @t{DCE Security version, with embedded POSIX UIDs.}))) (list (make-flow (list @t{3})) (make-flow (list @t{The name-based version specified in RFC 4122 that uses MD5 hashing.}))) (list (make-flow (list @t{4})) (make-flow (list @t{The randomly or pseudo-randomly generated version specified in RFC 4122.}))) (list (make-flow (list @t{5})) (make-flow (list @t{The name-based version specified in RFC 4122 that uses SHA-1 hashing.}))))) The version is more accurately a sub-type, but the term is retained for compatibility.} The following routines compare UUIDs by treating each as equivalent to an unsigned 128-bit integer. @defproc[(uuid=? (uuid-1 uuid?) (uuid-2 uuid?)) boolean]{ Returns @scheme[#t] if @scheme[uuid-1] = @scheme[uuid-2].} @defproc[(uuid? (uuid-1 uuid?) (uuid-2 uuid?)) boolean]{ Returns @scheme[#t] if @scheme[uuid-1] > @scheme[uuid-2].} @defproc[(hex-string? (x any/c)) boolean?]{ Returns @scheme[#t] if @scheme[x] is a hexadecimal string.} @defproc[(hex-string->uuid (hex-string hex-string?)) uuid?]{ Returns the UUID represented by @scheme[hex-string]. An error is raised if @scheme[hex-string] is not exactly 32 characters in length.} @defproc[(uuid-string? (x any/c)) boolean?]{ Returns @scheme[#t] if @scheme[x] is a string representing a UUID. The string representation of a UUID may be a 32-character hexadecimal string, a 36-character string in 8-4-4-4-12 format, or a 45-characters UUID URN ("urn:uuid:" prepended to an 8-4-4-4-12 formatted UUID).} @defproc[(string->uuid (string string?)) (or/c uuid? false/c)]{ Returns the UUID represented by @scheme[string] or @scheme[#f] if the string is not a UUID.} Examples: @scheme[(string->uuid "f81d4fae7dec11d0a76500a0c91e6bf6")] @linebreak[] @schemefont{#} @scheme[(string->uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6")] @linebreak[] @schemefont{#} @defproc[(uuid->hex-string (uuid uuid?)) string?]{ Returns @scheme[uuid] as a 32-character hexadecimal string.} Example: @scheme[(define UUID (string->uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"))] @linebreak[] @scheme[(uuid->hex-string UUID)] @linebreak[] @schemefont{"f81d4fae7dec11d0a76500a0c91e6bf6"} @defproc[(uuid->string (uuid uuid?)) string?]{ Returns @scheme[uuid] as a 36-character string in 8-4-4-4-12 format.} Example: @scheme[(define UUID (string->uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"))] @linebreak[] @scheme[(uuid->string UUID)] @linebreak[] @schemefont{"f81d4fae-7dec-11d0-a765-00a0c91e6bf6"} @defproc[(uuid->urn-string (uuid uuid?)) string?]{ Returns @scheme[uuid] as a Uniform Resource Name (URN), which is "urn:uuid:" prepended to the 8-4-4-4-12 formatted value of @scheme[uuid].} Example: @scheme[(define UUID (string->uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"))] @linebreak[] @scheme[(uuid->urn-string UUID)] @linebreak[] @schemefont{"urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6"} @defthing[nil-uuid uuid?]{ The nil UUID is a special form of UUID that is specified to have all 128 bits set to zero.} @scheme[nil-uuid] @linebreak[] @schemefont{#} The following identifiers are bound to predefined UUIDs that represent specific name spaces that are used to generate name-based UUIDs. @defthing[namespace-DNS uuid?]{ Used to generate name-based UUIDs from Domain Name Space (DNS) names.} @scheme[namespace-DNS] @linebreak[] @schemefont{#} @defthing[namespace-URL uuid?]{ Used to generate name-based UUIDs from Uniform Resource Locators (URLs).} @scheme[namespace-URL] @linebreak[] @schemefont{#} @defthing[namespace-OID uuid?]{ Used to generate name-based UUIDs from ISO Object IDs (OIDs).} @scheme[namespace-OID] @linebreak[] @schemefont{#} @defthing[namespace-X500 uuid?]{ Used to generate name-based UUIDs from X.500 Distinquished Names (DNs).} @scheme[namespace-X500] @linebreak[] @schemefont{#} @subsection{Time-Based (Type 1) UUIDs} A time-based (type 1) UUID uses the current time in number of 100 nanosecond intervals since 00:00:00.00 UTC, 10 October 1582 (60 bits), a clock sequence number to help avoid duplicates (14 bits), and the IEEE 801 MAC address (48 bits) to generate a unique identifier. Note that the current time field will not rollover until around A.D. 3400. @defproc[(make-uuid-1) uuid?]{ Returns a time-based (type 1) UUID as specified in RFC 4122. The current time has a resolution of milliseconds, that is, the least-significant four decimal digits are always zero. This implementation does no maintain any state information on the UUID generation and always uses a random clock sequence number. This is fine for low volume UUID generation (e.g., tens per millisecond). The primary MAC address of the host computer is used. If this cannot be determined, a random broadcast MAC address is used (47 random bits plus the broadcast bit set), which cannot clash with the MAC address of any real hardware device.} Example: @scheme[(make-uuid-1)] @linebreak[] @schemefont{#} @defproc[(uuid-1->date (uuid uuid?)) date?]{ Returns the date and time that a time-based (type 1) UUID was created.} Note that the following examples were run with the locale set to MST (GMT-7). Examples: @scheme[(define UUID (string->uuid "d2177dd0-eaa2-11de-a572-001b779c76e3"))] @linebreak[] @scheme[(date->string (uuid-1->date UUID) #t)] @linebreak[] @schemefont{"Wednesday, December 16th, 2009 5:26:29pm"} @scheme[(define UUID (string->uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"))] @linebreak[] @scheme[(date->string (uuid-1->date UUID) #t)] @linebreak[] @schemefont{"Monday, February 3rd, 1997 10:43:12am"} @subsection{Name-Based (Types 3 and 5) UUIDs} The version 3 or 5 UUID is meant for generating UUIDs from @italic{names} that are drawn from, and unique within, some @italic{name space}. The concept of name and name space should be broadly construed, and not limited to textual names. For example, some name spaces are the domain name space, URLs, ISO Object IDs (OIDs), X.500 Distinquished Names (DNs), and reserved words in a programming language. Name-based UUIDs may be generated using either MD5 hashing, for type 3 UUIDs, or SHA-1 hashing, for type 5 UUIDs. If backward compatibility is not an issue, SHA-1 is preferred. Note that there is an apparent error in the RFC 4122 specification. (See @url{http://www.rfc-editor.org/errata_search.php?rfc=4122}.) Specifically, the reference implementation swaps the eight octets 0..3, 4..5, and 6..7 twice, for the name space UUID and for the MD5 output, as foreseen for little endian input, but the values are already big endian - that is, only one swap is needed. Most implementations (e.g., the Unix uuid command and Python library) used the corrected implementation, but some others have not. We have added a Boolean-valued @scheme[#:legacy] keyword to specify which result to compute: @scheme[#f] for the corrected version or @scheme[#t] for the original (i.e., 'buggy') version. The default is the corrected version. @defproc[(make-uuid-3 (namespace-uuid uuid?) (name string?) (#:legacy legacy? boolean? #f)) uuid?]{ Returns a name-based (type 3) UUID as specified in RFC 4122 using MD5 hashing. If @scheme[legacy?] is @scheme[#t], then the output value matches that of the original ('buggy') RFC 4122 reference implementation. This should only be used when compability with a known buggy implementation is required.} Examples: @scheme[(make-uuid-3 namespace-DNS "www.widgets.com")] @linebreak[] @schemefont{#} @scheme[(make-uuid-3 namespace-DNS "www.widgets.com" #:legacy #t)] @linebreak[] @schemefont{#} @defproc[(make-uuid-5 (namespace-uuid uuid?) (name string?) (#:legacy legacy? boolean? #f)) uuid?]{ Returns a name-based (type 5) UUID as specified in RFC 4122 using SHA-1 hashing. If @scheme[legacy?] is @scheme[#t], then the output value matches that of the original ('buggy') RFC 4122 reference implementation. This should only be used when compability with a known buggy implementation is required.} Examples: @scheme[(make-uuid-5 namespace-DNS "www.widgets.com")] @linebreak[] @schemefont{#} @scheme[(make-uuid-5 namespace-DNS "www.widgets.com" #:legacy #t)] @linebreak[] @schemefont{#} @subsection{Pseudo-Random (Type 4) UUIDs} A type 4 UUID is created using pseudo-random numbers. The resulting 128-bit UUID contains 122 random bit plus 2 bits specifying the variant (RFC 4122) and 4 bits specifying the version (4). @defproc[(make-uuid-4) uuid?]{ Returns a pseudo-random (type 4) UUID.} Example: @scheme[(make-uuid-4)] @linebreak[] @schemefont{#} @section{Example} The following example demonstrates various functions of the UUIS library. @#reader scribble/comment-reader (schememod scheme (require scheme/date) (require "uuid.ss") ;;; Time-Based UUIDs (define U1 (make-uuid-1)) (printf "(make-uuid-1)~n~a~n" U1) (printf "Created ~a~n~n" (date->string (uuid-1->date U1) #t)) ;;; Name-Based UUID Using MD5 Hashing (printf "(make-uuid-3 namespace-DNS \"www.widgets.com\")~n~a~n" (make-uuid-3 namespace-DNS "www.widgets.com")) (printf "(make-uuid-3 namespace-DNS \"www.widgets.com\" #:legacy #t)~n~a~n~n" (make-uuid-3 namespace-DNS "www.widgets.com" #:legacy #t)) ;;; Name-Based UUID Using SHA-1 Hashing (printf "(make-uuid-5 namespace-DNS \"www.widgets.com\")~n~a~n" (make-uuid-5 namespace-DNS "www.widgets.com")) (printf "(make-uuid-5 namespace-DNS \"www.widgets.com\" #:legacy #t)~n~a~n~n" (make-uuid-5 namespace-DNS "www.widgets.com" #:legacy #t)) ;;; (Pseudo-)Random UUID (define U4 (make-uuid-4)) (printf "(make-uuid-4)~n~a~n~n" U4) (printf "U4 = ~a~n~n" U4) (printf "(uuid->string U4)~n~s~n~n" (uuid->string U4)) (printf "(uuid->urn-string U4)~n~s~n~n" (uuid->urn-string U4)) ;;; Comparisons (printf "namespace-DNS = ~a~n" namespace-DNS) (printf "(uuid=? U4 U4) = ~a~n" (uuid=? U4 U4)) (printf "(uuid=? U4 namespace-DNS) = ~a~n" (uuid=? U4 namespace-DNS)) (printf "(uuid? U4 namespace-DNS) = ~a~n" (uuid>? U4 namespace-DNS)) ) Produces the following output. @verbatim{ (make-uuid-1) # Created Wednesday, December 16th, 2009 8:58:51pm (make-uuid-3 namespace-DNS "www.widgets.com") # (make-uuid-3 namespace-DNS "www.widgets.com" #:legacy #t) # (make-uuid-5 namespace-DNS "www.widgets.com") # (make-uuid-5 namespace-DNS "www.widgets.com" #:legacy #t) # (make-uuid-4) # U4 = # (uuid->string U4) "ab595962-0a37-4520-8bef-afc559955201" (uuid->urn-string U4) "urn:uuid:ab595962-0a37-4520-8bef-afc559955201" namespace-DNS = # (uuid=? U4 U4) = #t (uuid=? U4 namespace-DNS) = #f (uuid? U4 namespace-DNS) = #t } @section{Issues and Comments} The biggest issue is that of the 'buggy' reference implementation in RFC 4122 with regard to the generation of name-based UUIDs. It seems that the implementation in the Unix @tt{uuid} command and the Python library (among others) is the correct implementation and we use this as the default behavior. However, it also seems that there are implementations 'in the wild' that match the original RFC 4122 reference implementation. Therefore, we also provide this behavior using the @scheme[#:legacy] keyword. The current time is measured at millisecond accuracy, which means we lose a significant amount of the available address space---5 decimal digits. The advantage is a simple, portable implementation. This is fine for low-volume UUID generation. The current implementation does not maintain any state information for UUID generation. This means that we generate a new random clock sequence for every new time-based UUID, which increases the probability of collisions. Again, this is fine for low-volume UUID generation. At some point, @schemefont{(make-uuid-1)} needs to allow the optional specification of the node to use. Currently, this is the primary MAC address for the machine on which the code is run, which could be considered a security issue in some cases.