4 Search (SXPath)
Given a representation of a path, produces a procedure that
accepts an SXML document and returns a list of matches. Path
representations are interpreted according to the following rewrite
rules.
(sxpath '())
| ⇒ | (node-join)
|
(sxpath (cons path-component0 path-components))
|
| ⇒ | |
| | |
(sxpath1 '//)
| ⇒ | (sxml:descendant-or-self sxml:node?)
|
(sxpath1 `(equal? ,x))
| ⇒ | (select-kids (node-equal? x))
|
(sxpath1 `(eq? ,x))
| ⇒ | (select-kids (node-eq? x))
|
(sxpath1 `(*or* ,p ...))
| ⇒ | (select-kids (ntype-names?? `(,p ...)))
|
(sxpath1 `(*not* ,p ...))
| ⇒ | |
(sxpath1 `(ns-id:* ,x))
| ⇒ | (select-kids (ntype-namespace-id?? x))
|
(sxpath1 symbol)
| ⇒ | (select-kids (ntype?? symbol))
|
(sxpath1 string)
| ⇒ | (txpath string)
|
(sxpath1 procedure)
| ⇒ | procedure
|
(sxpath1 `(,symbol ,reducer ...))
|
| ⇒ | (sxpath1 `((,symbol) ,reducer ...))
|
(sxpath1 `(,path ,reducer ...))
|
| ⇒ | |
| | |
(sxpathr number)
| ⇒ | (node-pos number)
|
(sxpathr path)
| ⇒ | (sxml:filter (sxpath path))
|
To extract all cells from an HTML table:
> (define table | `(*TOP* | (table | (tr (td "a") (td "b")) | (tr (td "c") (td "d"))))) |
|
|
> ((sxpath '(table tr td)) table) |
'((td "a") (td "b") (td "c") (td "d")) |
To extract all cells anywhere in a document:
> (define table | `(*TOP* | (div | (p (table | (tr (td "a") (td "b")) | (tr (td "c") (td "d")))) | (table | (tr (td "e")))))) |
|
|
> ((sxpath '(// td)) table) |
'((td "a") (td "b") (td "c") (td "d") (td "e")) |
One result may be nested in another one:
> (define doc | `(*TOP* | (div | (p (div "3") | (div (div "4")))))) |
|
|
> ((sxpath '(// div)) doc) |
'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4")) |
There’s also a string-based syntax, txpath. As shown in the grammar above,
sxpath assumes that any strings in the path are expressed using the
txpath syntax.
So, for instance, the prior example could be rewritten using a string:
> (define doc | `(*TOP* | (div | (p (div "3") | (div (div "4")))))) |
|
|
> ((sxpath "//div") doc) |
'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4")) |
More generally, lists in the s-expression syntax correspond to string
concatenation in the txpath syntax.
So, to find all italics that appear at top level within a paragraph:
> (define doc | `(*TOP* | (div | (p (i "3") | (froogy (i "4")))))) |
|
|
> ((sxpath "//p/i") doc) |
'((i "3")) |
Handling of namespaces in sxpath is a bit surprising. In particular,
it appears to me that sxpath’s model is that namespaces must appear fully expanded
in the matched source. For instance:
> ((sxpath "//ns:p" `((ns . "http://example.com"))) | '(*TOP* (html (http://example.com:body | (http://example.com:p "first para") | (http://example.com:p | "second para containing" | (http://example.com:p "third para") "inside it"))))) |
|
'((http://example.com:p "first para") | (http://example.com:p | "second para containing" | (http://example.com:p "third para") | "inside it") | (http://example.com:p "third para")) |
|
But the corresponding example where the source document contains a namespace shortcut does
not match in the same way. That is:
> ((sxpath "//ns:p" `((ns . "http://example.com"))) | '(*TOP* (@ (*NAMESPACES* (ns "http://example.com"))) | (html (ns:body (ns:p "first para") | (ns:p "second para containing" | (ns:p "third para") "inside it"))))) |
|
'() |
It produces the empty list. Instead, you must pretend that the
shortcut is actually the namespace. Thus:
> ((sxpath "//ns:p" `((ns . "ns"))) | '(*TOP* (@ (*NAMESPACES* (ns "http://example.com"))) | (html (ns:body (ns:p "first para") | (ns:p "second para containing" | (ns:p "third para") "inside it"))))) |
|
'((ns:p "first para") | (ns:p "second para containing" (ns:p "third para") "inside it") | (ns:p "third para")) |
|
Ah well.
Like
sxpath, but only accepts an XPath query in string
form, using the standard XPath syntax.
Deprecated; use sxpath instead.
A
sxml-converter is a function
that is, it takes nodes or nodesets to nodesets. A
sxml-converter-as-predicate is an
sxml-converter used
as a predicate; a return value of
'() indicates false.
Returns #t if v is a list of nodes (that is, a
list that does not start with a symbol).
If
v is a nodeset, returns
v, otherwise returns
(list v).
Examples: |
> (as-nodeset '(p "blah")) | '((p "blah")) | > (as-nodeset '((p "blah") (br) "more")) | '((p "blah") (br) "more") |
|
Returns a converter that selects the nth element (counting
from 1, not 0) of a nodelist and returns it as a singleton nodelist. If
n is negative, it selects from the right: -1
selects the last node, and so forth.
Examples: |
> ((node-pos 2) '((a) (b) (c) (d) (e))) | '((b)) | > ((node-pos -1) '((a) (b) (c))) | '((c)) |
|
Returns a converter that selects an (ordered) subset of the children
of the given node (or the children of the members of the given
nodelist) satisfying pred.
Like
select-kids but returns only the first one, or
#f if none.
Returns a function that when applied to
node, returns
(list node) if
(pred node) is neither
#f
nor
'(), otherwise returns
'().
XPath axes and accessors.
XPath axes and accessors that depend on the root node.