4 Search (SXPath)

(sxpath path [ns-bindings]) → (-> (or/c node nodeset?) nodeset?)
path : (or/c list? string?)
ns-bindings : (listof (cons/c symbol? string?)) = '()

Given a representation of a path, produces a procedure that accepts an SXML document and returns a list of matches. Path representations are interpreted according to the following rewrite rules.

(sxpath '())
⇒
(node-join)
(sxpath (cons path-component0 path-components))
⇒
(node-join (sxpath1 path-component0)
           (sxpath path-components))

(sxpath1 '//)
⇒
(sxml:descendant-or-self sxml:node?)
(sxpath1 `(equal? ,x))
⇒
(select-kids (node-equal? x))
(sxpath1 `(eq? ,x))
⇒
(select-kids (node-eq? x))
(sxpath1 `(*or* ,p ...))
⇒
(select-kids (ntype-names?? `(,p ...)))
(sxpath1 `(*not* ,p ...))
⇒
(select-kids
(sxml:complement
  (ntype-names?? `(,p ...))))
(sxpath1 `(ns-id:* ,x))
⇒
(select-kids (ntype-namespace-id?? x))
(sxpath1 symbol)
⇒
(select-kids (ntype?? symbol))
(sxpath1 string)
⇒
(txpath string)
(sxpath1 procedure)
⇒
procedure
(sxpath1 `(,symbol ,reducer ...))
⇒
(sxpath1 `((,symbol) ,reducer ...))
(sxpath1 `(,path ,reducer ...))
⇒
(node-reduce (sxpath path)
             (sxpathr reducer) ...)

(sxpathr number)
⇒
(node-pos number)
(sxpathr path)
⇒
(sxml:filter (sxpath path))

To extract all cells from an HTML table:

> (define table
    `(*TOP*
      (table
       (tr (td "a") (td "b"))
       (tr (td "c") (td "d")))))
> ((sxpath '(table tr td)) table)
'((td "a") (td "b") (td "c") (td "d"))

To extract all cells anywhere in a document:

> (define table
    `(*TOP*
      (div
       (p (table
           (tr (td "a") (td "b"))
           (tr (td "c") (td "d"))))
       (table
        (tr (td "e"))))))
> ((sxpath '(// td)) table)
'((td "a") (td "b") (td "c") (td "d") (td "e"))

One result may be nested in another one:

> (define doc
    `(*TOP*
      (div
       (p (div "3")
          (div (div "4"))))))
> ((sxpath '(// div)) doc)
'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4"))

There’s also a string-based syntax, txpath. As shown in the grammar above, sxpath assumes that any strings in the path are expressed using the txpath syntax.

So, for instance, the prior example could be rewritten using a string:

> (define doc
    `(*TOP*
      (div
       (p (div "3")
          (div (div "4"))))))
> ((sxpath "//div") doc)
'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4"))

More generally, lists in the s-expression syntax correspond to string concatenation in the txpath syntax.

So, to find all italics that appear at top level within a paragraph:

> (define doc
    `(*TOP*
      (div
       (p (i "3")
          (froogy (i "4"))))))
> ((sxpath "//p/i") doc)
'((i "3"))

Handling of namespaces in sxpath is a bit surprising. In particular, it appears to me that sxpath’s model is that namespaces must appear fully expanded in the matched source. For instance:

> ((sxpath "//ns:p" `((ns . "http://example.com")))
   '(*TOP* (html (http://example.com:body
                  (http://example.com:p "first para")
                  (http://example.com:p
                   "second para containing"
                   (http://example.com:p "third para") "inside it")))))
'((http://example.com:p "first para")
  (http://example.com:p
   "second para containing"
   (http://example.com:p "third para")
   "inside it")
  (http://example.com:p "third para"))

But the corresponding example where the source document contains a namespace shortcut does not match in the same way. That is:

> ((sxpath "//ns:p" `((ns . "http://example.com")))
   '(*TOP* (@ (*NAMESPACES* (ns "http://example.com")))
           (html (ns:body (ns:p "first para")
                          (ns:p "second para containing"
                                (ns:p "third para") "inside it")))))
'()

It produces the empty list. Instead, you must pretend that the shortcut is actually the namespace. Thus:

> ((sxpath "//ns:p" `((ns . "ns")))
   '(*TOP* (@ (*NAMESPACES* (ns "http://example.com")))
           (html (ns:body (ns:p "first para")
                          (ns:p "second para containing"
                                (ns:p "third para") "inside it")))))
'((ns:p "first para")
  (ns:p "second para containing" (ns:p "third para") "inside it")
  (ns:p "third para"))

Ah well.

(txpath xpath-location-path [ns-bindings])
→ (-> (or/c node nodeset?) nodeset?)
xpath-location-path : string?
ns-bindings : (listof (cons/c symbol? string?)) = '()

Like sxpath, but only accepts an XPath query in string form, using the standard XPath syntax.

Deprecated; use sxpath instead.

A sxml-converter is a function

(-> (or/c node nodeset?)
nodeset?)

that is, it takes nodes or nodesets to nodesets. A sxml-converter-as-predicate is an sxml-converter used as a predicate; a return value of '() indicates false.

(nodeset? v) → boolean?
v : any/c

Returns #t if v is a list of nodes (that is, a list that does not start with a symbol).

Examples:
> (nodeset? '(p "blah"))
#f
> (nodeset? '((p "blah") (br) "more"))
#t

(as-nodeset v) → nodeset?
v : any/c

If v is a nodeset, returns v, otherwise returns (list v).

Examples:
> (as-nodeset '(p "blah"))
'((p "blah"))
> (as-nodeset '((p "blah") (br) "more"))
'((p "blah") (br) "more")

(node-eq? v) → (-> any/c boolean?)
v : any/c

Curried eq?.

(node-equal? v) → (-> any/c boolean?)
v : any/c

Curried equal?.

(node-pos n) → sxml-converter
n : (or/c exact-positive-integer? exact-negative-integer?)

Returns a converter that selects the nth element (counting from 1, not 0) of a nodelist and returns it as a singleton nodelist. If n is negative, it selects from the right: -1 selects the last node, and so forth.

Examples:
> ((node-pos 2) '((a) (b) (c) (d) (e)))
'((b))
> ((node-pos -1) '((a) (b) (c)))
'((c))

(sxml:filter pred) → sxml-converter
pred : sxml-converter-as-predicate

(sxml:complement pred) → sxml-converter-as-predicate
pred : sxml-converter-as-predicate

(select-kids pred) → sxml-converter
pred : sxml-converter-as-predicate

Returns a converter that selects an (ordered) subset of the children of the given node (or the children of the members of the given nodelist) satisfying pred.

Examples:
> ((select-kids (ntype?? 'p)) '(p "blah"))
'()
> ((select-kids (ntype?? '*text*)) '(p "blah"))
'("blah")
> ((select-kids (ntype?? 'p)) (list '(p "blah") '(br) '(p "blahblah")))
'()

(select-first-kid pred)
→ (-> (or/c node nodeset?) (or/c node #f))
pred : sxml-converter-as-predicate

Like select-kids but returns only the first one, or #f if none.

(node-self pred) → sxml-converter
pred : sxml-converter-as-predicate

Returns a function that when applied to node, returns (list node) if (pred node) is neither #f nor '(), otherwise returns '().

Examples:
> ((node-self (ntype?? 'p)) '(p "blah"))
'((p "blah"))
> ((node-self (ntype?? 'p)) '(br))
'()

(node-join selector) → sxml-converter
selector : sxml-converter

(node-reduce converter) → sxml-converter
converter : sxml-converter

(node-or converter) → sxml-converter
converter : sxml-converter

(node-closure converter) → sxml-converter
converter : sxml-converter

(sxml:attribute pred) → sxml-converter
  pred : sxml-converter-as-predicate
(sxml:child pred) → sxml-converter
  pred : sxml-converter-as-predicate
sxml:child-nodes : sxml-converter
sxml:child-elements : sxml-converter
(sxml:descendant pred) → sxml-converter
  pred : sxml-converter-as-predicate
(sxml:descendant-or-self pred) → sxml-converter
  pred : sxml-converter-as-predicate

XPath axes and accessors.

((sxml:parent pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
(node-parent root) → sxml-converter
  root : node
((sxml:ancestor pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:ancestor-or-self pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:following pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:following-sibling pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:preceding pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:preceding-sibling pred) root) → sxml-converter
  pred : sxml-converter-as-predicate
  root : node

XPath axes and accessors that depend on the root node.

← prev up next →

1	SXML
2	SAX Parsing
3	Serialization
4	Search (SXPath)
5	SXML Transformation
6	Automatically Extracted Comments
7	Raw Lists of Exported Identifiers