On this page:
sxpath
txpath
nodeset?
as-nodeset
node-eq?
node-equal?
node-pos
sxml: filter
sxml: complement
select-kids
select-first-kid
node-self
node-join
node-reduce
node-or
node-closure
sxml: attribute
sxml: child
sxml: child-nodes
sxml: child-elements
sxml: descendant
sxml: descendant-or-self
sxml: parent
node-parent
sxml: ancestor
sxml: ancestor-or-self
sxml: following
sxml: following-sibling
sxml: preceding
sxml: preceding-sibling

4 Search (SXPath)

(sxpath path [ns-bindings])  (-> (or/c node nodeset?) nodeset?)
  path : (or/c list? string?)
  ns-bindings : (listof (cons/c symbol? string?)) = '()
Given a representation of a path, produces a procedure that accepts an SXML document and returns a list of matches. Path representations are interpreted according to the following rewrite rules.

(sxpath '())

(node-join)

(sxpath (cons path-component0 path-components))

(node-join (sxpath1 path-component0)
           (sxpath path-components))

 

(sxpath1 '//)

(sxml:descendant-or-self sxml:node?)

(sxpath1 `(equal? ,x))

(select-kids (node-equal? x))

(sxpath1 `(eq? ,x))

(select-kids (node-eq? x))

(sxpath1 `(*or* ,p ...))

(select-kids (ntype-names?? `(,p ...)))

(sxpath1 `(*not* ,p ...))

(select-kids
 (sxml:complement
  (ntype-names?? `(,p ...))))

(sxpath1 `(ns-id:* ,x))

(select-kids (ntype-namespace-id?? x))

(sxpath1 symbol)

(select-kids (ntype?? symbol))

(sxpath1 string)

(txpath string)

(sxpath1 procedure)

procedure

(sxpath1 `(,symbol ,reducer ...))

(sxpath1 `((,symbol) ,reducer ...))

(sxpath1 `(,path ,reducer ...))

(node-reduce (sxpath path)
             (sxpathr reducer) ...)

 

(sxpathr number)

(node-pos number)

(sxpathr path)

(sxml:filter (sxpath path))

To extract all cells from an HTML table:

> (define table
    `(*TOP*
      (table
       (tr (td "a") (td "b"))
       (tr (td "c") (td "d")))))
> ((sxpath '(table tr td)) table)

'((td "a") (td "b") (td "c") (td "d"))

To extract all cells anywhere in a document:

> (define table
    `(*TOP*
      (div
       (p (table
           (tr (td "a") (td "b"))
           (tr (td "c") (td "d"))))
       (table
        (tr (td "e"))))))
> ((sxpath '(// td)) table)

'((td "a") (td "b") (td "c") (td "d") (td "e"))

One result may be nested in another one:

> (define doc
    `(*TOP*
      (div
       (p (div "3")
          (div (div "4"))))))
> ((sxpath '(// div)) doc)

'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4"))

There’s also a string-based syntax, txpath. As shown in the grammar above, sxpath assumes that any strings in the path are expressed using the txpath syntax.

So, for instance, the prior example could be rewritten using a string:

> (define doc
    `(*TOP*
      (div
       (p (div "3")
          (div (div "4"))))))
> ((sxpath "//div") doc)

'((div (p (div "3") (div (div "4")))) (div "3") (div (div "4")) (div "4"))

More generally, lists in the s-expression syntax correspond to string concatenation in the txpath syntax.

So, to find all italics that appear at top level within a paragraph:

> (define doc
    `(*TOP*
      (div
       (p (i "3")
          (froogy (i "4"))))))
> ((sxpath "//p/i") doc)

'((i "3"))

Handling of namespaces in sxpath is a bit surprising. In particular, it appears to me that sxpath’s model is that namespaces must appear fully expanded in the matched source. For instance:

> ((sxpath "//ns:p" `((ns . "http://example.com")))
   '(*TOP* (html (http://example.com:body
                  (http://example.com:p "first para")
                  (http://example.com:p
                   "second para containing"
                   (http://example.com:p "third para") "inside it")))))

'((http://example.com:p "first para")

  (http://example.com:p

   "second para containing"

   (http://example.com:p "third para")

   "inside it")

  (http://example.com:p "third para"))

But the corresponding example where the source document contains a namespace shortcut does not match in the same way. That is:

> ((sxpath "//ns:p" `((ns . "http://example.com")))
   '(*TOP* (@ (*NAMESPACES* (ns "http://example.com")))
           (html (ns:body (ns:p "first para")
                          (ns:p "second para containing"
                                (ns:p "third para") "inside it")))))

'()

It produces the empty list. Instead, you must pretend that the shortcut is actually the namespace. Thus:

> ((sxpath "//ns:p" `((ns . "ns")))
   '(*TOP* (@ (*NAMESPACES* (ns "http://example.com")))
           (html (ns:body (ns:p "first para")
                          (ns:p "second para containing"
                                (ns:p "third para") "inside it")))))

'((ns:p "first para")

  (ns:p "second para containing" (ns:p "third para") "inside it")

  (ns:p "third para"))

Ah well.

(txpath xpath-location-path [ns-bindings])
  (-> (or/c node nodeset?) nodeset?)
  xpath-location-path : string?
  ns-bindings : (listof (cons/c symbol? string?)) = '()
Like sxpath, but only accepts an XPath query in string form, using the standard XPath syntax.

Deprecated; use sxpath instead.

A sxml-converter is a function
(-> (or/c node nodeset?)
    nodeset?)
that is, it takes nodes or nodesets to nodesets. A sxml-converter-as-predicate is an sxml-converter used as a predicate; a return value of '() indicates false.

(nodeset? v)  boolean?
  v : any/c
Returns #t if v is a list of nodes (that is, a list that does not start with a symbol).

Examples:

> (nodeset? '(p "blah"))

#f

> (nodeset? '((p "blah") (br) "more"))

#t

(as-nodeset v)  nodeset?
  v : any/c
If v is a nodeset, returns v, otherwise returns (list v).

Examples:

> (as-nodeset '(p "blah"))

'((p "blah"))

> (as-nodeset '((p "blah") (br) "more"))

'((p "blah") (br) "more")

(node-eq? v)  (-> any/c boolean?)
  v : any/c
Curried eq?.
(node-equal? v)  (-> any/c boolean?)
  v : any/c
Curried equal?.

(node-pos n)  sxml-converter
  n : (or/c exact-positive-integer? exact-negative-integer?)
Returns a converter that selects the nth element (counting from 1, not 0) of a nodelist and returns it as a singleton nodelist. If n is negative, it selects from the right: -1 selects the last node, and so forth.

Examples:

> ((node-pos 2) '((a) (b) (c) (d) (e)))

'((b))

> ((node-pos -1) '((a) (b) (c)))

'((c))

(sxml:filter pred)  sxml-converter
  pred : sxml-converter-as-predicate
(sxml:complement pred)  sxml-converter-as-predicate
  pred : sxml-converter-as-predicate

Returns a converter that selects an (ordered) subset of the children of the given node (or the children of the members of the given nodelist) satisfying pred.

Examples:

> ((select-kids (ntype?? 'p)) '(p "blah"))

'()

> ((select-kids (ntype?? '*text*)) '(p "blah"))

'("blah")

> ((select-kids (ntype?? 'p)) (list '(p "blah") '(br) '(p "blahblah")))

'()

(select-first-kid pred)
  (-> (or/c node nodeset?) (or/c node #f))
  pred : sxml-converter-as-predicate
Like select-kids but returns only the first one, or #f if none.

Returns a function that when applied to node, returns (list node) if (pred node) is neither #f nor '(), otherwise returns '().

Examples:

> ((node-self (ntype?? 'p)) '(p "blah"))

'((p "blah"))

> ((node-self (ntype?? 'p)) '(br))

'()

(node-join selector)  sxml-converter
  selector : sxml-converter
(node-reduce converter)  sxml-converter
  converter : sxml-converter
(node-or converter)  sxml-converter
  converter : sxml-converter
(node-closure converter)  sxml-converter
  converter : sxml-converter

XPath axes and accessors.

((sxml:parent pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
(node-parent root)  sxml-converter
  root : node
((sxml:ancestor pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:ancestor-or-self pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:following pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:following-sibling pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:preceding pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
((sxml:preceding-sibling pred) root)  sxml-converter
  pred : sxml-converter-as-predicate
  root : node
XPath axes and accessors that depend on the root node.