doc.txt

SSAX Package

SSAX Package
============

A SSAX functional XML parsing framework consists of a DOM/SXML parser, a SAX
parser, and a supporting library of lexing and parsing procedures. The
procedures in the package can be used separately to tokenize or parse various
pieces of XML documents. The framework supports XML Namespaces, character,
internal and external parsed entities, attribute value normalization,
processing instructions and CDATA sections. The package includes a
semi-validating SXML parser: a DOM-mode parser that is an instantiation of 
a SAX parser (called SSAX).

SSAX is a full-featured, algorithmically optimal, pure-functional parser,
which can act as a stream processor. SSAX is an efficient SAX parser that is
easy to use. SSAX minimizes the amount of application-specific state that has
to be shared among user-supplied event handlers. SSAX makes the maintenance
of an application-specific element stack unnecessary, which eliminates several
classes of common bugs. SSAX is written in a pure-functional subset of Scheme.
Therefore, the event handlers are referentially transparent, which makes them
easier for a programmer to write and to reason about. The more expressive,
reliable and easier to use application interface for the event-driven XML
parsing is the outcome of implementing the parsing engine as an enhanced tree
fold combinator, which fully captures the control pattern of the depth-first
tree traversal.

-------------------------------------------------

Quick start

; procedure: ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG
;
; This is an instance of a SSAX parser that returns an SXML
; representation of the XML document to be read from PORT.
; NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING)
; that assigns USER-PREFIXes to certain namespaces identified by
; particular URI-STRINGs. It may be an empty list.
; The procedure returns an SXML tree. The port points out to the
; first character after the root element.
(define (ssax:xml->sxml port namespace-prefix-assig) ...)

; procedure: pre-post-order TREE BINDINGS
;
;	          Traversal of an SXML tree or a grove:
;			a <Node> or a <Nodelist>
;
; A <Node> and a <Nodelist> are mutually-recursive datatypes that
; underlie the SXML tree:
;	<Node> ::= (name . <Nodelist>) | "text string"
; An (ordered) set of nodes is just a list of the constituent nodes:
; 	<Nodelist> ::= (<Node> ...)
; Nodelists, and Nodes other than text strings are both lists. A
; <Nodelist> however is either an empty list, or a list whose head is
; not a symbol (an atom in general). A symbol at the head of a node is
; either an XML name (in which case it's a tag of an XML element), or
; an administrative name such as '@'.
; See SXPath.scm and SSAX.scm for more information on SXML.
;
;
; Pre-Post-order traversal of a tree and creation of a new tree:
;	pre-post-order:: <tree> x <bindings> -> <new-tree>
; where
; <bindings> ::= (<binding> ...)
; <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
;               (<trigger-symbol> *macro* . <handler>) |
;		(<trigger-symbol> <new-bindings> . <handler>) |
;		(<trigger-symbol> . <handler>)
; <trigger-symbol> ::= XMLname | *text* | *default*
; <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
;
; The pre-post-order function visits the nodes and nodelists
; pre-post-order (depth-first).  For each <Node> of the form (name
; <Node> ...) it looks up an association with the given 'name' among
; its <bindings>. If failed, pre-post-order tries to locate a
; *default* binding. It's an error if the latter attempt fails as
; well.  Having found a binding, the pre-post-order function first
; checks to see if the binding is of the form
;	(<trigger-symbol> *preorder* . <handler>)
; If it is, the handler is 'applied' to the current node. Otherwise,
; the pre-post-order function first calls itself recursively for each
; child of the current node, with <new-bindings> prepended to the
; <bindings> in effect. The result of these calls is passed to the
; <handler> (along with the head of the current <Node>). To be more
; precise, the handler is _applied_ to the head of the current node
; and its processed children. The result of the handler, which should
; also be a <tree>, replaces the current <Node>. If the current <Node>
; is a text string or other atom, a special binding with a symbol
; *text* is looked up.
;
; A binding can also be of a form
;	(<trigger-symbol> *macro* . <handler>)
; This is equivalent to *preorder* described above. However, the result
; is re-processed again, with the current stylesheet.
;
(define (pre-post-order tree bindings) ...)

-------------------------------------------------

Additional tools included into the package

1. "access-remote.ss"
 Uniform access to local and remote resources
 Resolution for relative URIs in accordance with RFC 2396

2. "id.ss"
 Creation and manipulation of the ID-index for a faster access to SXML elements
 by their unique ID
 Provides the DTD parser for extracting ID attribute declarations

3. "xlink-parser.ss"
 Parser for XML documents that contain XLink elements

4. "multi-parser.ss"
 SSAX multi parser: combines several specialized parsers into one
 Provides creation of parent pointers to SXML document constructed