doc.txt

Datum grammar acceptors

_Datum grammar acceptors_
_grammar.ss_

David Van Horn
<dvanhorn@cs.brandeis.edu>

The _grammar_ syntax contains a context-free grammar and returns an
acceptor for the grammar. The grammar is expressed in "parenthesized,
prefix, extended BNF". The returned acceptor is a predicate that takes
a datum and returns a boolean indicating if the datum can be generated
by the given grammar. Expressing any Scheme-like syntax is
straightforward, and a hook for introducing non-terminals defined
using arbitrary Scheme code allows any grammar that defines a subset
of Scheme's datum syntax to be handled.

The grammar form is originally by Chris Haynes, written for Chez
Scheme.  It has been ported to MzScheme with minimal modifications.

In addition to the grammar syntax, there is a procedural interface
(with contracts) provided by _grammar-procedures.ss_ for constructing
context free grammar acceptors.


Several examples are provided in _grammar-examples.ss_.

BUGS: There are known issues.  The r4rs-scheme? example fails simple
tests; see the test suite for details.


The original source code is available at:

   http://www.cs.indiana.edu/proglang/scheme/src/grammar.ss

The orginal documentation follows:

   http://www.cs.indiana.edu/proglang/scheme/grammar.html

===

The grammar form, obtained by loading the file grammar.ss, contains a
context-free grammar and returns an acceptor for the grammar. The
grammar is expressed in "parenthesized, prefix, extended BNF". The
returned acceptor is a predicate that takes a datum and returns a
boolean indicating if the datum can be generated by the given
grammar. Expressing any Scheme-like syntax is straightforward, and a
hook for introducing non-terminals defined using arbitrary Scheme code
allows any grammar that defines a subset of Scheme's datum syntax to
be handled.

An extended-BNF grammar for the grammar form follows:

<grammar expression> --> ( grammar <start> <production>* ) 

<start> --> <variable> 

<production> --> ( <variable> <element>* ) 

<variable> --> <symbol> 

<element> --> <terminal> | <non-terminal> 

<terminal> --> '<datum> 

<non-terminal> --> <variable> 
  | ( alt <element>* ) 
  | ( seq <element>* ) 
  | ( lst <element>* ) 
  | ( star <element> ) 
  | ( plus <element> ) 
  | ( opt <element> ) 
  | ( dot <element> <element> ) 
  | ( predicate <expression> ) 
  | ( cfa <expression> ) 
  | ( report-if-bad <datum> <non-terminal> ) 

<expression> --> any Scheme expression

The <start> variable must be one of the production variables, and
indicates the starting non-terminal of the grammar.

alt indicates alternation,

seq sequencing,

lst descent into and sequencing through a sub-list,

star zero or more (Kleene's star),

plus one or more,

opt optional (zero or one), and

dot a dotted pair, where the first element of dot recognizes a list of
the car elements of an improper list, while the second element
recognizes the trailing cdr element.

In predicate, the expression shall eveluate to a predicate that is
passed the car of the list being accepted and returns a boolean
indicating whether it should be accepted.

In cfa, the expression shall evaluate to a procedure that takes the
list being accepted and returns #f if it is to be rejected, or a tail
of the list if the corresponding prefix of the list is accepted.

If the non-terminal of

(report-if-bad datum non-terminal)

fails on input, a message of the form "Bad datum: input" is printed
with a maximum print depth of two. Also, from the time when the first
such message is generated until the grammar expression returns, alts
fail immediately. This keeps other alt elements from being tried when
it is known that the acceptor fails. Control does return through
dynamically enclosing report-if-bad elements, allowing the location of
the original error to be traced.

An acceptor for the grammar form may be obtained with:

(grammar grammar-expression
  (datum (predicate (lambda (x) #t)))
  (expression datum)
  (grammar-expression
    (report-if-bad 'grammar-expression
      (lst 'grammar start (plus production))))
  (start variable)
  (variable (predicate symbol?))
  (production (report-if-bad 'production (lst variable (star element))))
  (element (alt terminal non-terminal))
  (terminal (lst 'quote datum))
  (non-terminal
    (report-if-bad 'non-terminal
      (alt variable
        (lst 'alt (star element))
        (lst 'seq (star element))
        (lst 'lst (star element))
        (lst 'star element)
        (lst 'plus element)
        (lst 'opt element)
        (lst 'dot element element)
        (lst 'predicate expression)
        (lst 'cfa expression)
        (lst 'report-if-bad datum non-terminal)))))

For a more substantial example, study the acceptor for the
context-free grammar of Scheme transliterated from the R4RS formal
syntax. Of particular interest are the way procedure calls are defined
to obtain suitable Bad expression messages, and the way dot is used in
defining <formals>.

Chris Haynes / chaynes@indiana.edu