uri: Web Uniform Resource Identifiers (URI and URL) in Racket

Version: 0.2

uri: Web Uniform Resource Identifiers (URI and URL) in Racket

Neil Van Dyke

License: LGPL 3 Web: http://www.neilvandyke.org/racket-uri/

1 Introduction

WARNING: This package is being actively developed. A future version is expected to introduce some major non-backward-compatible changes.

uri is a Racket code library for parsing, representing, and transforming Web Uniform Resource Identifiers (URI) , which includes Uniform Resource Locators (URL) and Uniform Resource Names (URN). It supports absolute and relative URIs and URI references. RFC2396 is the principal reference used for this implementation. Earlier versions were informed by other RFCs, including RFC2396 and RFC2732.

Goals of this package are correctness, efficiency, and power.

2 Escaping and Unescaping

Several procedures to support escaping and unescaping of URI component strings, as described in [RFC2396 sec. 2.4], are provided. Also provided are escaping and unescaping procedures that also support + as an encoding of a space character, as is used in some HTTP encodings of HTML forms.

These procedures have multiple variants, concerning mutability of the strings they yield, and following the naming convention:

foo-i
Always yields an immutable string (or a new string, if the Scheme implementation does not support immutable string).
foo/new-mutable
Always yields a new, mutable string.
foo/shared-ok
If the output is equal to the input, might yield the input string rather than yielding a copy of it.

Many applications will not call these procedures directly, since most of this library’s interface automatically escapes and unescapes strings as appropriate.

(uri-escape str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-escape/new-mutable str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-escape/shared-ok str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

Yields a URI-escaped encoding of string str. If start and end are given, then they designate the substring of str to use. All characters are escaped, except alphanumerics, minus, underscore, period, and tilde. For example.

(uri-escape "a = b/c + d") ==> "a%20%3D%20b%2Fc%20%2B%20d"

(uri-plusescape-i str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-plusescape/new-mutable str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-plusescape/shared-ok str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

Like uri-escape, except encodes space characters as "+" instead of "%20". This should generally only be used to mimic the encoding some Web browsers do of HTML form values. For example:

(uri-plusescape "a = b/c + d") ==> "a+%3D+b%2Fc+%2B+d"

(uri-unescape-i str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-unescape/new-mutable str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-unescape/shared-ok str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

Yields an URI-unescaped string from the encoding in string str. If start and end are given, then they designate the substring of str to use. For example:

(uri-unescape "a%20b+c%20d") ==> "a b+c d"

(uri-unplusescape-i str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-unplusescape/new-mutable str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(uri-unplusescape/shared-ok str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

Like uri-unescape, but also decodes the plus (+) character as to space character. For example:

(uri-unplusescape "a%20b+c%20d") ==> "a b c d"

(char->uri-escaped-string chr) → any/c
chr : any/c

(char->uri-escaped-string-i chr) → any/c
chr : any/c

Yields a URI-escaped string of character chr. For example:

(char->uri-escaped-string #\/) ==> "%2F"

3 URI API

This section describes the “URI string” API, while the next section describes the “URI object,” (uri) API. All procedures in this section yield URIs using immutable strings, and accept URIs as strings (immutable or mutable) or as the opaque objects described in the next section.

3.1 Predicate

(uri? v) → any/c
v : any/c

!!!

3.2 Converting Strings to URI Objects

(string->uri str) → any/c
str : any/c

(string/base->uri str base-uri) → any/c
str : any/c
base-uri : any/c

(string/base-uri->uri str base-uri) → any/c
str : any/c
base-uri : any/c

!!!

(substring->uri str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(substring/base-uri-or-string->uri str
start
end
base-uri-or-string) → any/c
  str : any/c
  start : any/c
  end : any/c
  base-uri-or-string : any/c

(substring/base-uri->uri str
start
end
base-uri) → any/c
  str : any/c
  start : any/c
  end : any/c
  base-uri : any/c

!!!

(uri-or-string->uri uri-or-string ==> uri) → any/c
  uri-or-string : any/c
  ==> : any/c
  uri : any/c

!!! convenience

3.3 Writing URIs to Ports and Converting URIs to Strings

(display-uri uri port) → any/c
uri : any/c
port : any/c

(display-uri/nofragment uri port) → any/c
uri : any/c
port : any/c

Displays uri to output port port. For example:

(display-uri "http://s/foo#bar" (current-output-port))
|- http://s/foo#bar (display-uri/nofragment "http://s/foo#bar" (current-output-port)) -|

http://s/foo

(uri->string uri) → any/c
uri : any/c

Yields the full string representation of URI uri. Of course this is not needed when using only the string representation of URI, but using this procedure in libraries permits the uri to also be used. For example:

(define my-uri (string->uri "http://www/"))
my-uri ==> <uri:"http://www/">
(uri->string my-uri) ==> "http://www/"

3.4 URI Schemes

URI schemes are currently represented as lowercase Racket symbols and associated data.

ftp-uri-scheme : any/c

gopher-uri-scheme : any/c

http-uri-scheme : any/c

https-uri-scheme : any/c

imap-uri-scheme : any/c

ipp-uri-scheme : any/c

news-uri-scheme : any/c

nfs-uri-scheme : any/c

telnet-uri-scheme : any/c

Some common URI scheme symbols, as a convenience for Racket code that must be portable to Racket implementations with case-insensitive readers. For example, in some Racket implementations:

'ftp ==> FTP
ftp-uri-scheme ==> ftp

(uri-scheme uri) → any/c
uri : any/c

Yields the URI scheme of uri, or #f if none can be determined. For example:

(uri-scheme "Http://www") ==> http

(register-uri-scheme-default-portnum sym
portnum) → any/c
sym : any/c
portnum : any/c

Registers integer portnum as the default port number for the server authority component of URI scheme sym.

(define x-foo-uri-scheme (string->symbol "x-foo"))
(register-uri-scheme-default-portnum x-foo-uri-scheme 7)
(register-uri-scheme-default-portnum x-foo-uri-scheme 666)
error--> cannot change uri scheme default portnum: x-foo 7 666

(register-uri-scheme-hierarchical sym) → any/c
sym : any/c

Registers URI scheme sym as having a “hierarchical” form as described in [RFC2396 sec. 3].

3.5 URI Reference Fragment Identifiers

(uri-fragment uri) → any/c
uri : any/c

(uri-fragment/escaped uri) → any/c
uri : any/c

Yields the fragment identifier component of URI (or URI reference) uri as a string, or #f if there is no fragment. uri-fragment yields the fragment in unescaped form, and uri-fragment/escaped yields an escaped form in the unusual case that is desired. For example:

(uri-fragment "foo#a%20b") ==> "a b"
(uri-fragment/escaped "foo#a%20b") ==> "a%20b"

(uri-without-fragment uri) → any/c
uri : any/c

Yields uri without the fragment component. For example:

(uri-without-fragment "http://w/#bar") ==> "http://w/"

(uri-with-fragment uri fragment) → any/c
uri : any/c
fragment : any/c

(uri-with-fragment/escaped uri fragment) → any/c
uri : any/c
fragment : any/c

Yields a URI that is like uri except with the fragment fragment (or no fragment if fragment is #f). For example:

(uri-with-fragment "http://w/" "foo") ==> "http://w/#foo"
(uri-with-fragment "http://w/#foo" "bar") ==> "http://w/#bar"
(uri-with-fragment "http://w/#bar" #f) ==> "http://w/"

The uri-with-fragment/escaped variant can be used when the desired fragment string is already in uri-escaped form:

(uri-with-fragment "foo" "a b") ==> "foo#a%20b"
(uri-with-fragment/escaped "foo" "a%20b") ==> "foo#a%20b"

3.6 Hierarchical URIs

This and some of the following subsections concern “hierarchical” generic URI syntax as described in RFC2396 sec. 3.

(uri-hierarchical? uri) → any/c
uri : any/c

Yields a Boolean value for whether or not the URI scheme of URI uri is known to have a “hierarchical” generic URI layout. For example:

(uri-hierarchical? "http://www/") ==> #t
(uri-hierarchical? "mailto://www/") ==> #f
(uri-hierarchical? "//www/") ==> #f

3.7 Server-Based Naming Authorities

Several procedures extract the server authority values from URIs [RFC2396 sec. 3.2.2].

(uri-server-userinfo+host+portnum uri) → any/c
uri : any/c

Yields three values for the server authority of URI uri: the userinfo as a string (or #f), the host as a string (or #f), and the effective port number as an integer (or #f). The effective port number of a server authority defaults to the default of the URI scheme unless overridden. For example (note the effective port number is 21, the default for the ftp scheme):

(uri-server-userinfo+host+portnum "ftp://anon@ftp.foo.bar/")
==> "anon" "ftp.foo.bar" 21

(uri-server-userinfo uri) → any/c
uri : any/c

(uri-server-host uri) → any/c
uri : any/c

(uri-server-portnum uri) → any/c
uri : any/c

Yield the respective part of the server authority of uri. See the discussion of uri-server-userinfo+host+portnum.

3.8 Hierarchical Paths

A parsed hierarchical path [RFC2396 sec. 3] is represented in uri as a tuple of a list of path segments and an upcount. The list of path segments does not contain any “.” or “..” relative components, as those are removed during parsing. The upcount is either #f, meaning an absolute path, or an integer 0 or greater, meaning a relative path of that many levels “up.” A path segment without any parameters is represented as either a string or, if empty, #f. For example:

(uri-path-upcount+segments "/a/b/")       ==> #f ("a" "b" #f)
(uri-path-upcount+segments "/a/b/c")      ==> #f ("a" "b" "c")
(uri-path-upcount+segments "/a/../../../b/c") ==> 2  ("b" "c")

and:

(uri-path-upcount+segments "/.")  ==> #f ()
(uri-path-upcount+segments "/")   ==> #f (#f)
(uri-path-upcount+segments ".")   ==> 0  (#f)
(uri-path-upcount+segments "")    ==> 0  ()
(uri-path-upcount+segments "./")  ==> 0  (#f)
(uri-path-upcount+segments "..")  ==> 1  ()
(uri-path-upcount+segments "/..") ==> 1  ()
(uri-path-upcount+segments "../") ==> 1  (#f)

A path segment with parameters is represented as a list, with the first element a string or #f for the path name, and the remaining elements strings for the parameters. For example:

(uri-path-segments "../../a/b;p1/c/d;p2;p3/;p4")
==> ("a" ("b" "p1") "c" ("d" "p2" "p3") (#f "p4"))

In the current version of uri, parsed paths are actually represented in reverse, which simplifies path resolution and permits list tails to be shared among potentially large numbers of long paths. For example (uripath is a concept of the “object URI” API):

(let ((base (string->uripath "/a/b/c/index.html")))
  (map (lambda (n)
         (resolved-uripath (string->uripath n) base))
       '("x.html" "y/y.html" "../z/z.html")))
==>

(("x.html" . #0=("c" . #1=("b" "a")))
("y.html" "y" . #0#)
("z.html" "z" . #1#))

(uri-path-upcount+segments uri) → any/c
uri : any/c

(uri-path-upcount+segments/reverse uri) → any/c
uri : any/c

Yields the path upcount and the segments of uri as two values. The segments list should be considered immutable, as it might be shared elsewhere. uri-path-upcount+segments/reverse yields the segments list in reverse order, and is the more efficient of the two procedures.

(uri-path-upcount+segments/reverse "../a/../../b/./c")
==> 2 ("c" "b")
(uri-path-upcount+segments "../a/../../b/./c")
==> 2 ("b" "c")

(uri-path-upcount uri) → any/c
uri : any/c

(uri-path-segments uri) → any/c
uri : any/c

(uri-path-segments/reverse uri) → any/c
uri : any/c

See the documentation for uri-path-upcount+segments.

(uri-path-upcount "../a/../../b/./c") ==> 2
(uri-path-segments "../a/../../b/./c") ==> ("b" "c")
(uri-path-segments/reverse "../a/../../b/./c") ==> ("c" "b")

(urisegment-name urisegment) → any/c
urisegment : any/c

(urisegment-params urisegment) → any/c
urisegment : any/c

(urisegment-name+params urisegment) → any/c
urisegment : any/c

(urisegment-has-params? urisegment) → any/c
urisegment : any/c

Yield the components of a parsed URI segment. The values should be considered immutable. For example:

(urisegment-name+params "foo")              ==> "foo" ()
(urisegment-name+params #f)                 ==> #f    ()
(urisegment-name+params '("foo" "p1" "p2")) ==> "foo" ("p1" "p2")
(urisegment-name+params '(#f    "p1" "p2")) ==> #f    ("p1" "p2")

3.9 Attribute-Value Queries

This library provides support for parsing the URI query component [RFC2396 sec. 3.4], as attribute-value lists in the manner of http URI scheme queries. Parsed queries are represented as association lists, in which the car of each pair is the attribute name as a string, and the cdr is either the attribute value as a string or #t if no value given. All strings are uri-unescaped. For example:

(uri-query "?q=fiendish+scheme&case&x=&y=1%2B2")
==>
(("q" . "fiendish scheme") ("case" . #t) ("x" . "") ("y" . "1+2"))

(uri-query uri) → any/c
uri : any/c

Yields the parsed attribute-value query of uri, or #f if no query. For example:

(uri-query "?x=42&y=1%2B2") ==> (("x" . "42") ("y" . "1+2"))

(uri-query-value uri attr) → any/c
uri : any/c
attr : any/c

Yields the value of attribute attr in uri’s query, or #f if uri has no query component or no attr attribute. If the attribute appears multiple times in the query, the value of the first occurrence is used. For example:

(uri-query-value "?x=42&y=1%2B2" "y") ==> "1+2"

(uriquery-value uriquery attr) → any/c
uriquery : any/c
attr : any/c

Yields the value of attribute attr in uriquery, or #f if there is no such attribute. If the attribute appears multiple times in the query, the value of the first occurrence is used.

3.10 Resolving Relative URI

This subsection concerns resolving relative URI.

(absolute-uri? uri) → any/c
uri : any/c

Yields a Boolean value for whether or not URI uri is known by the library’s criteria to be absolute.

(resolved-uri uri base-uri) → any/c
uri : any/c
base-uri : any/c

Yields a URI string that is URI uri possibly resolved with respect to URI base-uri, but not necessarily absolute. As an extension to [RFC2396] rules for resolution, base-uri may be a relative URI.

(resolved-uri "x.html" "http://w/a/b/c.html")
==> "http://w/a/b/x.html"
(resolved-uri "//www:80/" "http:")
==> "http://www/"

(absolute-uri uri) → any/c
uri : any/c

Yields a URI that may be a variation on uri that has been forced to absolute (by, e.g., dropping relative path components, or supplying a missing path). The result might not be an absolute URI, however, due to limitations of the library or insufficient information in the URI. For example:

(absolute-uri "http://w/../a") ==> "http://w/a"
(absolute-uri "http://w") ==> "http://w/"

4 URI Schemes

(uri-scheme uri) → any/c
uri : any/c

!!!

(uri-with-scheme uri urischeme) → any/c
uri : any/c
urischeme : any/c

!!!

(string->urischeme str) → any/c
str : any/c

(symbol->urischeme sym) → any/c
sym : any/c

!!!

(urischeme->string) → any/c

!!!

(urischeme-hierarchical? urischeme) → any/c
urischeme : any/c

!!!

(urischeme-default-portnum urischeme) → any/c
urischeme : any/c

!!!

5 Hierarchical URIs

(uri-uriserver uri) → any/c
uri : any/c

!!!

(uri-uriserver+path+query uri) → any/c
uri : any/c

!!!

(uri-uriserver uri ==> uriserver) → any/c
  uri : any/c
  ==> : any/c
  uriserver : any/c

!!!

(uri-uriserver+uripath+uriquery uri) → any/c
uri : any/c

!!!

(uri-userinfo+host+portnum uri) → any/c
uri : any/c

!!!

(uri-portnum uri) → any/c
uri : any/c

!!!

(make-uriserver userinfo host portnum) → any/c
  userinfo : any/c
  host : any/c
  portnum : any/c

(make-uriserver/default-portnum userinfo
host
portnum
default-portnum) → any/c
  userinfo : any/c
  host : any/c
  portnum : any/c
  default-portnum : any/c

!!!

(make-or-reuse-uriserver userinfo
host
portnum
base-uriserver) → any/c
  userinfo : any/c
  host : any/c
  portnum : any/c
  base-uriserver : any/c

(make-or-reuse-uriserver/default-portnum userinfo
host
portnum
base-uriserver
default-portnum)
→ any/c
  userinfo : any/c
  host : any/c
  portnum : any/c
  base-uriserver : any/c
  default-portnum : any/c

!!!

(string->uriserver str) → any/c
str : any/c

(string/base->uriserver str base-uriserver) → any/c
str : any/c
base-uriserver : any/c

(string/default-portnum->uriserver str
default-portnum) → any/c
str : any/c
default-portnum : any/c

(string/base/default-portnum->uriserver str
base-uriserver
default-portnum)
→ any/c
  str : any/c
  base-uriserver : any/c
  default-portnum : any/c

(substring->uriserver str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(substring/base->uriserver str
start
end
base-uriserver) → any/c
  str : any/c
  start : any/c
  end : any/c
  base-uriserver : any/c

(substring/default-portnum->uriserver str
start
end
default-portnum) → any/c
  str : any/c
  start : any/c
  end : any/c
  default-portnum : any/c

(substring/base/default-portnum->uriserver str
start
end
base-uriserver
default-portnum)
→ any/c
  str : any/c
  start : any/c
  end : any/c
  base-uriserver : any/c
  default-portnum : any/c

!!!

(uriserver-userinfo uriserver) → any/c
uriserver : any/c

(uriserver-host uriserver) → any/c
uriserver : any/c

(uriserver-portnum uriserver) → any/c
uriserver : any/c

(uriserver-userinfo+host+portnum uriserver) → any/c
uriserver : any/c

!!!

(write-uriserver uriserver port) → any/c
uriserver : any/c
port : any/c

!!!

(uriserver-with-default-portnum uriserver
default-portnum) → any/c
uriserver : any/c
default-portnum : any/c

!!!

(resolved-uriserver uriserver
base-uriserver) → any/c
uriserver : any/c
base-uriserver : any/c

(resolved-uriserver/default-portnum uriserver
base-uriserver
default-portnum) → any/c
  uriserver : any/c
  base-uriserver : any/c
  default-portnum : any/c

!!!

5.1 Hierarchical Paths

(uri-path uri) → any/c
uri : any/c

(uri-path/noparams uri) → any/c
uri : any/c

(uri-uripath uri) → any/c
uri : any/c

(uri-uripath/noparams uri) → any/c
uri : any/c

!!!

(make-uripath upcount segments) → any/c
upcount : any/c
segments : any/c

(make-uripath/reverse upcount segments) → any/c
upcount : any/c
segments : any/c

(make-uripath/reverse/shared-ok upcount
segments) → any/c
upcount : any/c
segments : any/c

!!!

(uripath-with-upcount uripath upcount) → any/c
uripath : any/c
upcount : any/c

!!!

(string->uripath str) → any/c
str : any/c

(string/base->uripath str base-uripath) → any/c
str : any/c
base-uripath : any/c

(substring->uripath str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

(substring/base->uripath str
start
end
base-uripath) → any/c
  str : any/c
  start : any/c
  end : any/c
  base-uripath : any/c

!!!

Note: Contrary to [RFC2396], we don’t require base to be absolute.

(uripath-upcount uripath) → any/c
uripath : any/c

(uripath-segments uripath) → any/c
uripath : any/c

(uripath-segments/reverse uripath) → any/c
uripath : any/c

(uripath-upcount+segments uripath) → any/c
uripath : any/c

(uripath-upcount+segments/reverse uripath) → any/c
uripath : any/c

!!!

(uripath-has-params? uripath) → any/c
uripath : any/c

!!!

(write-uripath uripath port) → any/c
uripath : any/c
port : any/c

(write-uripath/leading-slash uripath port) → any/c
uripath : any/c
port : any/c

!!!

(uripath->string uripath) → any/c
uripath : any/c

(uripath->string/leading-slash uripath) → any/c
uripath : any/c

!!!

(uri-path-segments "//a/b") ==> ("b")
(uri-path-segments "/.//a/b") ==> (#f "a" "b")

!!!

(uripath->string (string->uripath "//b"))
==> "//b"
(uripath->string/leading-slash (string->uripath "//b"))
==> "/.//b"
(uripath->string/leading-slash (string->uripath "/a/b"))
==> "/a/b"
(uripath->string/leading-slash (string->uripath "/;p1/b"))
==> "/;p1/b"

(resolved-uripath uripath base-uripath) → any/c
uripath : any/c
base-uripath : any/c

!!!

(absolute-uripath uripath) → any/c
uripath : any/c

!!!

5.2 Attribute-Value Queries

(uri-uriquery uri) → any/c
uri : any/c

!!!

(string->uriquery str) → any/c
str : any/c

(substring->uriquery str start end) → any/c
  str : any/c
  start : any/c
  end : any/c

!!!

(write-uriquery uriquery port) → any/c
uriquery : any/c
port : any/c

!!!

6 Antiresolution (In-Progress)

(antiresolved-uripath uripath base-uripath) → any/c
uripath : any/c
base-uripath : any/c

!!!

(antiresolved-uriserver uriserver
base-uriserver) → any/c
uriserver : any/c
base-uriserver : any/c

(antiresolved-uriserver/default-portnum uriserver
base-uriserver
default-portnum)
→ any/c
  uriserver : any/c
  base-uriserver : any/c
  default-portnum : any/c

!!!

7 History

Version 0.2 — 2011-08-23 — PLaneT (1 0)
This is a release of some code-in-progress that has been sitting around unreleased for years. It has been changed heavily since the 2004, non-PLaneT release, including getting rid of the "uriobj"-specific operations, so that all operations work on both string and object forms. A few tests fail. Non-backward-compatible API changes are expected.
Version 0.1 — 2004-08-18
Initial release. Incorporates some code from UriFrame.

8 Legal

Copyright (c) 2003–2011 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.

Standard Documentation Format Note: The API signatures in this documentation are likely incorrect in some regards, such as indicating type any/c for things that are not, and not indicating when arguments are optional. This is due to a transitioning from the Texinfo documentation format to Scribble, which the author intends to finish someday.

1	Introduction
2	Escaping and Unescaping
3	URI API
4	URI Schemes
5	Hierarchical URIs
6	Antiresolution (In-Progress)
7	History
8	Legal