2 Pattern Matching of XML

2.1 xml-match


xml-match provides pattern matching of XML nodes. The pattern notation is based on that of Scheme's syntax-rules/syntax-case macro systems.

The grammar for the xml-match syntax is given below:

match-form ::= (xml-match input-expression

clause ::= [node-pattern action-expression+]
         | [node-pattern (guard expression*) action-expression+]

node-pattern ::= literal-pattern
               | pat-var-or-cata
               | element-pattern
               | list-pattern

literal-pattern ::= string
                  | character
                  | number
                  | #t
                  | #f

attribute-pattern ::= attribute-keyword attr-val-pattern

attr-val-pattern ::= literal-pattern
                   | pat-var-or-cata
                   | (pat-var-or-cata default-value-expr)

element-pattern ::= (tag-symbol attribute-pattern*)
                  | (tag-symbol attribute-pattern* nodeset-pattern)
                  | (tag-symbol attribute-pattern*
                                nodeset-pattern? . pat-var-or-cata)

list-pattern ::= (list nodeset-pattern)
               | (list nodeset-pattern? . pat-var-or-cata)
               | (list)

nodeset-pattern ::= node-pattern
                  | node-pattern ...
                  | node-pattern nodeset-pattern
                  | node-pattern ... nodeset-pattern

pat-var-or-cata ::= (unquote var-symbol)
                  | (unquote [var-symbol*])
                  | (unquote [cata-expression -> var-symbol*])

Within a list or element body pattern, ellipses may appear only once, but may be followed by zero or more node patterns.

Guard expressions cannot refer to the return values of catamorphisms.

Ellipses in the output expressions must appear only in an expression context; ellipses are not allowed in a syntactic form.

The sections below illustrate specific aspects of the xml-match pattern matcher.

Matching XML Elements

The example below illustrates the pattern matching of an XML element:

(xml-match (e i: 1 3 4 5)
  [(e i: ,d ,a ,b ,c) (list d a b c)]
  [,otherwise #f])

The element and attribute tags used in patterns are the element and attribute constructors created by define-element and define-attribute. Each clause in xml-match contains two parts: a pattern and one or more expressions which are evaluated if the pattern is successfully match.

Pattern variables are must be "unquoted" in the pattern. The above expression binds d to 1, a to 3, b to 4, and c to 5.

Matching Nodesets

A nodeset pattern is designated by a list in the pattern, beginning the identifier list. The example below illustrates matching a nodeset.

(xml-match '("i" "j" "k" "l" "m")
  [(list ,a ,b ,c ,d ,e)
   (list (h4:p a) (h4:p b) (h4:p c) (h4:p d) (h4:p e))])

This example wraps each nodeset item in an HTML paragraph element.

Matching the 'Rest' of a Nodeset

Matching the 'rest' of a nodeset is achieved by using a ". rest)" pattern at the end of an element or nodeset pattern.

This is illustrated in the example below:

(xml-match (e 3 (f 4 5 6) 7)
  [(e ,a (f . ,y) ,d)
   (list a y d)])

The above expression returns (3 (4 5 6) 7).

Ellipses in Patterns

As in syntax-rules, ellipses may be used to specify a repeated pattern. Note that the pattern "item ..." specifies zero-or-more matches of the pattern "item".

The use of ellipses in a pattern is illustrated in the code fragment below, where nested ellipses are used to match the children of repeated instances of an 'a' element, within an element 'd'.

(define x (d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))

(xml-match x
  [(d (a ,b ...) ...)
   (list (list b ...) ...)])

The above expression returns a value of ((1 2 3) (4 5) (6 7 8) (9 10)).

Default Values in Attribute Patterns

It is possible to specify a default value for an attribute which is used if the attribute is not present in the element being matched. This is illustrated in the following example:

(xml-match (e 3 4 5)
  [(e z: (,d 1) ,a ,b ,c) (list d a b c)])

The value 1 is used when the attribute 'z' is absent from the element 'e'.

Guards in Patterns

Guards may be added to a pattern clause via the guard keyword. A guard expression may include zero or more expressions which are evaluated only if the pattern is matched. The body of the clause is only evaluated if the guard expressions evaluate to #t.

The use of guard expressions is illustrated below:

(xml-match '(a 2 3)
  ((a ,n) (guard (number? n)) n)
  ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))


The example below illustrates the use of explicit recursion within an xml-match form. This example implements a simple calculator for the basic arithmetic operations, which are represented by the XML elements plus, minus, times, and div.

(define simple-eval
  (lambda (x)
    (xml-match x
      [,i (guard (integer? i)) i]
      [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
      [(times ,x ,y) (* (simple-eval x) (simple-eval y))]
      [(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
      [(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
      [,otherwise (error "simple-eval: invalid expression" x)])))

Using the catamorphism feature of xml-match , a more concise version of simple-eval can be written. The pattern ,[x] recusively invokes the pattern matcher on the value bound in this position.

(define simple-eval
  (lambda (x)
    (xml-match x
      [,i (guard (integer? i)) i]
      [(plus ,[x] ,[y]) (+ x y)]
      [(times ,[x] ,[y]) (* x y)]
      [(minus ,[x] ,[y]) (- x y)]
      [(div ,[x] ,[y]) (/ x y)]
      [,otherwise (error "simple-eval: invalid expression" x)])))


It is also possible to explicitly name the operator in the 'cata' position. Where ,[id*] recurs to the top of the current xml-match, ,[cata -> id*] recurs to cata. cata must evaluate to a procedure which takes one argument, and returns as many values as there are identifiers following ->.

Named catamorphism patterns allow processing to be split into multiple, mutually recursive procedures. This is illustrated in the example below: a transformation that formats a "TV Guide" into HTML.

(define (tv-guide->html g)
  (define (cast-list cl)
    (xml-match cl
      [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
       (h4:div (h4:ul (h4:li ch ": " a) ...))]))
  (define (prog p)
    (xml-match p
      [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
                (Description . ,desc))
       (h4:div (h4:p start-time
                     (h4:br) series-title
                     (h4:br) desc))]
      [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
                (Description . ,desc)
                ,[cast-list -> cl])
       (h4:div (h4:p start-time
                     (h4:br) series-title
                     (h4:br) desc)
  (xml-match g
    [(TVGuide start: ,start-date
              end: ,end-date
              (Channel (Name ,nm) ,[prog -> p] ...) ...)
     (h4:html (h4:head (h4:title "TV Guide"))
              (h4:body (h4:h1 "TV Guide")
                       (h4:div (h4:h2 nm) p ...) ...))]))

Ellipses in Quasiquote'd Output

Within the body of an xml-match form, a slightly extended version of quasiquote is provided, which allows the use of ellipses. This is illustrated in the example below.

(xml-match '(e 3 4 5 6 7)
  [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
  [,otherwise #f])

The general pattern is that `(something ,i ...) is rewritten as `(something ,@i).

2.2 xml-match-let and xml-match-let*


The xml-match-let and xml-match-let* forms generalize the let and let* forms of Scheme to allow an XML pattern in the binding position, rather than a simple variable.

For example, the expression below:

(xml-match-let ([(a ,i ,j) '(a 1 2)])
  (+ i j))

binds the variables i and j to 1 and 2 in the XML value given.

The syntax for these forms is given below:

(xml-match-let ([pat expr] ...) expression0 expression ...)
(xml-match-let* ([pat expr] ...) expression0 expression ...)



Last modified: Sunday, April 24th, 2005 2:52:27pm
HTML generated using WebIt!.