1 Introduction
1.1 Differences with SXML
2 SXML and SXML/ xexp Tools
3 Definitions
3.1 Exceptions
exn: fail: invalid-xexp
make-invalid-xexp-exn
raise-invalid-xexp-exn
3.2 Misc.
make-xexp-char-ref
xexp-char-ref-value
always-empty-html-elements
4 History
5 Legal
Version: 2:1

SXML/xexp Representation of XML and HTML in Racket

Neil Van Dyke

 (require (planet neil/xexp:2:1))

1 Introduction

Note: This package is in a state of active development, and some interface changes, perhaps not backward-compatible, are expected. Documentation is also in-progress.
SXML is a representation for XML in Scheme, defined by Oleg Kiselyov. “SXML/xexp” is the temporary name for a format for Racket that’s based on SXML and is mostly compatible with it. SXML/xexp is used for both HTML and XML. The current plan is, hopefully, for the “/xexp” part of the name to go away, and for SXML and SXML/xexp to merge. For now, Racket language identifiers based on SXML/xexp will have “xexp”instead of “sxml”, because we do not want to call something “SXML” if it is not strictly SXML. (And, historically, “xexp” was much more different from SXML, while we experimented with unifying SXML, SHTML, and PLT xexpr, but we have decided to move back to as compatible with SXML as practical.)

1.1 Differences with SXML

SXML/xexp can be defined as differences from SXML:
  • xexp syntax must be ordered as in SXML first normal form (1NF). For example, any attributes list must precede child elements. SXML/xexp tools may be permissive about accepting other orderings, but generally should not emit any ordering but 1NF ordering.

  • The SXML keyword symbols, such as *TOP* may be in lowercase (e.g., *top*)

  • xexp adds a special & syntax for character entity references. The syntax is (& val), where val is the symbolic name of the character as a symbol, or an integer with the numeric value of the character.

2 SXML and SXML/xexp Tools

Libraries using SXML/xexp include:

html-writing

Writing HTML from SXML/xexp.

html-template

Writing HTML from SXML/xexp templates.

html-parsing

Permissively parsing HTML to SXML/xexp.

WebScraperHelper

Example-based SXPath query generation for SXML/xexp.

There are also some older libraries for SXML, which often can be used for SXML/xexp:

SXPath

XPath query language implementation for SXML, by Oleg Kiselyov.

sxml-match

Pattern-matching of SXML, by Jim Bender.

SSAX

Parsing of XML to SXML, by Oleg Kiselyov, and maintained by Kirill Lisovsky.

3 Definitions

Some definitions used by many SXML/xexp packages...

3.1 Exceptions

(struct exn:fail:invalid-xexp exn:fail (expected
    context-xexp
    invalid-xexp)
  #:extra-constructor-name make-exn:fail:invalid-xexp
  #:transparent)
  expected : string?
  context-xexp : any/c
  invalid-xexp : any/c
!!!
(make-invalid-xexp-exn 
  sym 
  #:continuation-marks continuation-marks 
  #:expected expected 
  #:invalid-xexp invalid-xexp 
  [#:context-xexp context-xexp]) 
  fail:exn:invalid-xexp?
  sym : symbol?
  continuation-marks : continuation-marks?
  expected : string?
  invalid-xexp : any/c
  context-xexp : any/c = (void)
Constructs a fail:exn:invalid-xexp exception object.
(raise-invalid-xexp-exn error-name-sym
                        #:expected expected
                        #:invalid-xexp invalid-xexp
                        maybe-context-xexp)
 
error-name-sym = symbol?
     
expected = string?
     
invalid-xexp = any/c
     
maybe-context-xexp = 
  | #:context-xexp any/c
!!!

3.2 Misc.

The following definitions are used by some xexp-related libraries.
(make-xexp-char-ref val)  xexp-char-ref?
  val : symbol?
Yields an SXML/xexp xexp character entity reference for val. For example:
> (make-xexp-char-ref 'rArr)
  (& rArr)
> (make-xexp-char-ref 151)
  (& 151)
(xexp-char-ref-value char-ref)  (or/c symbol? integer?)
  char-ref : xexp-char-ref?
Yields the symbol or integer value for SXML/xexp character reference char-ref. Raises exception exn:fail:invalid-xexp on an error. For example:
> (xexp-char-ref-value '(& nbsp))
  nbsp
> (xexp-char-ref-value '(& 2000))
  2000
always-empty-html-elements : (list/c symbol?)
Deprecated. This is a legacy definition from HtmlPrag that will eventually disappear.
List of symbols for names of HTML elements that can never have content. For example, the br element.

4 History

5 Legal

Copyright 2011 – 2012 Neil Van Dyke. This program is Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.