#lang scribble/manual @(require (planet cce/scheme:6:0/scribble) scribble/eval (for-label scheme (this-package-in simple-parser))) @(define make-my-eval (make-eval-factory '(scheme (planet orseau/lazy-doc:1:4/simple-parser)))) @(require (for-label (this-package-in common))) @title[#:tag "simple-parser"]{Simple Text Parser} @(defmodule/this-package simple-parser) This module provides a simple text parser that can read strings and turn them into data without first building lexems (although it can be used to either lex or parse). More complex or faster parsers may require the use of the parser-tools intergrated in Scheme. A parser is given a list of matcher procedures and associated action procedures. A matcher is generally a regexp, the associated action turns the matched text into something else. On the input string, the parser recursively looks for the matcher that matches the earliest character and applies its action. @scheme[no-match-proc] is applied to the portion of the string (before the first matched character) that has not been matched. The parser has an internal state, the "phase", where it is possible to define local parsers that only work when the parser is in that phase. Actions can make the parser switch to a given phase. Automata transitions can then easily be defined. Instead of switching to another phase, it is also possible to set the parser into a "sub-parser" mode, and to provide the sub-parser with a callback that will be applied only once the sub-parser has returned. The fastest and easiest way to understand how it works is probably to look at the examples in the @filepath{examples} directory. Somes simple examples are also given at the end of this page. See also the @filepath{defs-parser.ss} source file for a more complex usage. @section{Priorities} When parsing a string, among all matchers of the current phase, the matcher which action is triggered is the one that matches the earliest character in the string. If several matchers apply, then only the @emph{last} added matcher is chosen. In @scheme[add-items], the priority is for the matcher that is defined the lowest in the source file. @defproc[(parser? [p any/c]) boolean?]{ Returns @scheme[#t] if @scheme[p] is a parser, @scheme[#f] otherwise. } @section{Main Functions} @defproc[(new-parser [ no-match-proc procedure? identity] [#:phase phase any/c 'start] [#:appender appender procedure? string-append]) parser?]{ Creates a new parser with default behavior @scheme[no-match-proc], starting in phase @scheme[phase]. All the outputs generated byt the parser are then appended with @scheme[appender]. } @;NOT-PROVIDED[push-sub-parser] @;NOT-PROVIDED[pop-sub-parser] @;NOT-PROVIDED[add-item-general] @;NOT-PROVIDED[eq-phase?] @;NOT-PROVIDED[add-item-phase] @;NOT-PROVIDED[current-parser] @;NOT-PROVIDED[to-out] @;NOT-PROVIDED[to-phase?] @;NOT-PROVIDED[add-no-match-cond] @defproc[(add-item [ parser parser? ] [ phase? any/c ] [ in (or/c #t procedure? list? symbol? string?) ] [ out (or/c procedure? symbol? string?) ]) void?]{ Adds the matcher @scheme[in] and its associated action @scheme[out] to @scheme[parser]. The matcher will match only when the parser is in a phase that returns @scheme[#t] when applied to @scheme[phase?]. If @scheme[phase?] is a procedure, it will be used as is to match the parser's phase. If @scheme[phase?] equals @scheme[#t] it will be changed to @scheme[(λ args #t)] such that it matches any phase. Any other value of @scheme[phase] will be turned into a procedure that matches this value with @scheme[equal?]. If @scheme[in] is a string it will be turned into a procedure that matches the corresponding pregexp. If @scheme[in] is a symbol, it will be turned into a procedure that matches the corresponding pregexp with word boundaries on both sides, (useful for matching names or programming languages keywords). If @scheme[in] is a list, then @scheme[add-item] is called recursively on each member of @scheme[in] with the same @scheme[parser], @scheme[phase?] and @scheme[out]. If @scheme[in] equals @scheme[#t], it will modify the @scheme[no-match-proc] procedure to add the corresponding action when @scheme[phase?] applies to the parser. In the end, @scheme[in] has returns the same kind of values as @scheme[regexp-match-positions]. @scheme[out] must be a procedure that accepts the same number of arguments as the number of values returned by the matcher @scheme[in]. For example, if @scheme[in] is @scheme["aa(b+)c(d+)e"], then @scheme[out] must take 3 arguments (one for the whole string, and two for the b's and the d's). If @scheme[out] is not a procedure, it will be turned into a procedure that accepts any number of arguments and returns @scheme[out]. } @defform[(add-items parser [phase? [search-proc output-proc] ...] ...)]{ The general form for adding several items at once. See the examples at the end of this page. } @defproc[(parse-text [ parser parser? ] [#:phase phase any/c ((parser-phase parser))] [ text string? ] ...) (listof any/c)]{ Parses @scheme[text] with @scheme[parser], starting in phase @scheme[phase], which is the current phase by default. It is thus possible to call the parser inside the parsing phase, i.e once a portion of the text has been parsed, it can be given to the parser itself in some phase to make further transformations. This is not the same as sub-parsing because there is no callback. } @section{Matchers} This section describes matching functions that can be used in the @scheme[in] argument of @scheme[add-item] and @scheme[add-items]. @defproc[(re [ s string? ]) procedure?]{ Turns @scheme[s] into a pregexp and returns a procedure that takes an input string and applies @scheme[regexp-match-positions] on that string with the pregexp @scheme[s]. } @defproc[(txt [ s string? ]) procedure?]{ Same as @scheme[re] but regexp-quotes @scheme[s] beforehand, so that the string @scheme[s] is matched exactly. } @defproc[(kw [ s string? ]) procedure?]{ Same as @scheme[txt] but adds word-boundaries around @scheme[s]. } @section{Actions} This section describes action functions that can be used in the @scheme[out] argument of @scheme[add-item] and @scheme[add-items]. @defproc[(switch-phase [ phase any/c ]) string?]{ Sets the parser in the phase @scheme[phase] and returns @scheme[""]. } @defproc[(sub-parse [ new-phase any/c ] [ callback procedure? identity] [#:appender appender procedure? (parser-appender (current-parser))]) string?]{ Sets the current parser in sub-parse mode and switches to @scheme[new-phase]. The result of the sub-parse is appended with @scheme[appender], which by default is the same as the parser's. When the sub-parser has finished parsing (it has returned with @scheme[sub-parse-return]), @scheme[callback] is called with the result of the sub-parse and the result of @scheme[callback] is added to the current parser result. Sub-parsers can be called recursively, once in a sub-parsing mode or in the @scheme[callback]. Returns @scheme[""]. } @defproc[(cons-out [ out any/c ]) void?]{ By default, the parser agglomerates the return values of the action procedures. The function @scheme[cons-out] can be used to add a value to the parser without being a return value of an action. Should be rarely useful. } @defproc[(sub-parse-return [ out any/c #f]) any]{ Adds @scheme[out] to the current parser result and returns from the current sub-parsing mode. } @;NOT-PROVIDED[start-sub-parser] @;NOT-PROVIDED[terminate-sub-parser] @;NOT-PROVIDED[find-first-matcher] @;NOT-PROVIDED[positions->strings] @;NOT-PROVIDED[cons-output] @;NOT-PROVIDED[parse-line] @;NOT-PROVIDED[output-append] @section{Examples} @(examples #:eval (make-my-eval) (let ([p (new-parser)]) (add-items p ('start ["pl(.[^p]?)p" (λ(s x)(string-append " -gl" x "tch- "))] ["ou" "aï"] [#t string-upcase])) (parse-text p "youcoudouplipcoudouploup" "toupouchou")) (let ([tree-parser (new-parser #:appender (λ vals (remove* '(||) vals)))]) (add-items tree-parser ('start [#t string->symbol] ["\\s+" '||] ["\\(" (λ(s)(sub-parse 'start)'||)] ["\\)" (λ(s)(sub-parse-return))] )) (parse-text tree-parser "tree:(root (node1 (leaf1 leaf2) leaf3) (node2 leaf4 (node3 leaf5) leaf6) leaf7)")) ) Note that the result of the last example is Scheme data, not a string.