#lang scribble/doc @(require scribble/manual) @title{@bold{SXML}: The S-Expression representation of XML terms} @;{@author[(author+email "John Clements" "clements@racket-lang.org")]} @(require (planet cce/scheme:7:2/require-provide)) @(require (for-label racket (this-package-in main))) @defmodule[(planet clements/sxml2)]{This planet library contains Oleg Kiselyov's SXML libraries in a Racket-friendly format. It is a direct descendant of Dmitry Lizorkin's PLaneT package. It's different from that package in that @itemize[#:style 'ordered @item{It contains some documentation (here it is!),} @item{it contains some tests,} @item{it has only one require point (ssax & sxml are both included), and} @item{it doesn't depend on schemeunit:3, so it compiles quickly.}] This documentation is scraped together from various sources; the bulk of it (currently) is pulled from in-source comments. I'm hoping that this will become a Racket community project, with various people contributing documentation and test cases and maybe even bug fixes. To that end, this code currently lives in a github repository which should be fairly easy to find. Patches gratefully accepted. --John Clements, 2011-02-17} @section{SAX Parsing (input)} @defproc[(ssax:xml->sxml [port port?] [namespace-prefix-assig (listof (cons/c symbol? string?))]) sxml?]{ Reads a single xml element from the given @racket[port], and returns the corresponding sxml representation. The @racket[namespace-prefix-assig] association list provides shortened forms to be used in place of namespaces. So, for instance, @racketblock[ (ssax:xml->sxml (open-input-string "abcd") '())] Evaluates to: @racketblock['(*TOP* (zippy (pippy (|@| (pigtails "2")) "ab") "cd"))] } @section{Serialization (output)} @defproc[(srl:sxml->xml [sxml-obj sxml?] [dest port-or-filename? null]) (or/c string? unspecified)]{ Serializes the @racket[sxml-obj] into XML, with indentation to facilitate readability by a human. @itemize[ @item{@racket[sxml-obj] - an SXML object (a node or a nodeset) to be serialized} @item{@racket[dest] - an output port or an output file name, an optional argument}] If @racket[dest] is not supplied, the functions return a string that contains the serialized representation of the @racket[sxml-obj]. If @racket[dest] is supplied and is a port, the functions write the serialized representation of @racket[sxml-obj] to this port and return an unspecified result. If @racket[dest] is supplied and is a string, this string is treated as an output filename, the serialized representation of @racket[sxml-obj] is written to that filename and an unspecified result is returned. If a file with the given name already exists, the effect is unspecified. } @defproc[(srl:sxml->xml-noindent [sxml-obj sxml?] [dest port-or-filename? null]) (or/c string? unspecified) ]{ Serializes the @racket[sxml-obj] into XML, without indentation. } @defproc[(srl:sxml->html [sxml-obj sxml?] [dest port-or-filename? null]) (or/c string? unspecified)]{ Serializes the @racket[sxml-obj] into HTML, with indentation to facilitate readability by a human. @itemize[ @item{@racket[sxml-obj] - an SXML object (a node or a nodeset) to be serialized} @item{@racket[dest] - an output port or an output file name, an optional argument}] If @racket[dest] is not supplied, the functions return a string that contains the serialized representation of the @racket[sxml-obj]. If @racket[dest] is supplied and is a port, the functions write the serialized representation of @racket[sxml-obj] to this port and return an unspecified result. If @racket[dest] is supplied and is a string, this string is treated as an output filename, the serialized representation of @racket[sxml-obj] is written to that filename and an unspecified result is returned. If a file with the given name already exists, the effect is unspecified. } @defproc[(srl:sxml->html-noindent [sxml-obj sxml?] [dest port-or-filename? null]) (or/c string? unspecified)]{ Serializes the @racket[sxml-obj] into HTML, without indentation. } @section{Search (SXPATH)} @defproc[(sxpath [path abbr-sxpath?] [ns-binding ns-binding? '()]) procedure?]{ Given a path, produces a procedure that accepts an sxml document and returns a list of matches. Note that the @racket[*TOP*] node of the document is required. @verbatim{ AbbrPath is a list. It is translated to the full SXPath according to the following rewriting rules (sxpath '()) -> (node-join) (sxpath '(path-component ...)) -> (node-join (sxpath1 path-component) (sxpath '(...))) (sxpath1 '//) -> (sxml:descendant-or-self sxml:node?) (sxpath1 '(equal? x)) -> (select-kids (node-equal? x)) (sxpath1 '(eq? x)) -> (select-kids (node-eq? x)) (sxpath1 '(*or* ...)) -> (select-kids (ntype-names?? (cdr '(*or* ...)))) (sxpath1 '(*not* ...)) -> (select-kids (sxml:complement (ntype-names?? (cdr '(*not* ...))))) (sxpath1 '(ns-id:* x)) -> (select-kids (ntype-namespace-id?? x)) (sxpath1 ?symbol) -> (select-kids (ntype?? ?symbol)) (sxpath1 ?string) -> (txpath ?string) (sxpath1 procedure) -> procedure (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...)) (sxpath1 '(path reducer ...)) -> (node-reduce (sxpath path) (sxpathr reducer) ...) (sxpathr number) -> (node-pos number) (sxpathr path-filter) -> (filter (sxpath path-filter)) } Examples: All cells of an html table: @racketblock[ (define table `(*TOP* (table (tr (td "a") (td "b")) (tr (td "c") (td "d"))))) ((sxpath '(table tr td)) table)] ... produces: @racketblock['((td "a") (td "b") (td "c") (td "d"))] All cells anywhere in a document: @racketblock[ (define table `(*TOP* (div (p (table (tr (td "a") (td "b")) (tr (td "c") (td "d")))) (table (tr (td "e")))))) ((sxpath '(// td)) table)] ... produces: @racketblock['((td "a") (td "b") (td "c") (td "d") (td "e"))] One result may be nested in another one: @racketblock[ (define doc `(*TOP* (div (p (div "3") (div (div "4")))))) ((sxpath '(// div)) table) ] ... produces: @racketblock[ '((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4"))] } @section{Transformation (SXSLT)} @defproc[(pre-post-order [tree sxml?] [bindings (listof binding?)]) sxml?]{ Pre-Post-order traversal of a tree and creation of a new tree. @verbatim{ ::= ( *preorder* . ) | ( *macro* . ) | ( . ) | ( . ) ::= XMLname | *text* | *default* :: x [] -> } The pre-post-order function visits the nodes and nodelists pre-post-order (depth-first). For each of the form (name ...) it looks up an association with the given 'name' among its @racket[bindings]. If it fails, @racket[pre-post-order] tries to locate a *default* binding. It's an error if the latter attempt fails as well. Having found a binding, the pre-post-order function first checks to see if the binding is of the form @racketblock[( *preorder* . )] If it is, the handler is 'applied' to the current node. Otherwise, the pre-post-order function first calls itself recursively for each child of the current node, with prepended to the in effect. The result of these calls is passed to the (along with the head of the current ). To be more precise, the handler is _applied_ to the head of the current node and its processed children. The result of the handler, which should also be a , replaces the current . If the current is a text string or other atom, a special binding with a symbol *text* is looked up. A binding can also be of a form @racketblock[( *macro* . )] This is equivalent to *preorder* described above. However, the result is re-processed again, with the current stylesheet. A tiny example: @racketblock[ (require (planet clements/sxml2)) (define sample-doc `(*TOP* (html (title "the title") (body (p "paragraph 1") (p "paragraph 2"))))) (define italicizer `((p . ,(lambda (tag . content) (cons tag (cons "PARAGRAPH BEGINS: " content)))) (*text* . ,(lambda (tag content) `(i ,content))) (*default* . ,(lambda args args)))) (pre-post-order sample-doc italicizer)] produces: @racketblock[ '(*TOP* (html (title (i "the title")) (body (p "PARAGRAPH BEGINS: " (i "paragraph 1")) (p "PARAGRAPH BEGINS: " (i "paragraph 2")))))] } @include-section["extracted-sperber.scrbl"] @include-section["all-exported.scrbl"] @section{Reporting Bugs} For Heaven's sake, report lots of bugs!