#lang scribble/doc @(require scribble/manual scribble/extract) @(require (for-label "parser.rkt")) @(require (for-label racket/base)) @defmodule[squicky/parser] @title{Squicky: a scheme-based quick wiki parser} @author+email["Norman Gray"]{http://nxg.me.uk} This is a Racket-based parser for a wiki syntax based closely on @hyperlink["http://www.wikicreole.org/"]{WikiCreole}, as described below. @section{Usage} The dialect parsed here is the consensus WikiCreole syntax of @url{http://www.wikicreole.org/}. It handles all of the WikiCreole @hyperlink["http://www.wikicreole.org/wiki/Creole1.0TestCases"]{test cases}, except for one test of wiki-internal links (which is in any case somewhat underspecified). In particular, the supported syntax is @itemlist[ @item{@tt{//italics//}} @item{@tt{**bold**} : A line which begins with @tt{**}, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.} @item{@tt{##monospaced text##} : A line which begins with @tt{##}, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].} @item{@tt{ * bulleted list} : (including sublists, the asterisk may or may not be indented)} @item{@tt{ # numbered list} : (including sublists)} @item{@tt{>quoted paragraph} : including multiple levels (this appears to be an extension of WikiCreole).} @item{@tt{[[link to wikipage]]}} @item{@tt{[[URL|description]]}} @item{@tt{{{image.png}}} or @tt{{{image.png|alt text}}} or @tt{{{image.png|att=value;att2=value; or more}}}. In the last case, the @tt{att} indicates any attribute on the HTML @tt{} element, such as @tt{class}; the @tt{att} must immediately follow the semicolon (so the last case parses as @tt{att2='value; or more'}); and if the @tt{att} is omitted, it defaults to @tt{alt}.} @item{@tt{== heading}} @item{@tt{=== subheading}} @item{@tt{==== subsubheading}} @item{@tt{line\\break}} @item{@tt{----} : (four dashes in a row, on a line by themselves) horizontal list} @item{@tt{~e}scaped character, and @tt{~http://url} which isn't linked} @item{@verbatim|{{{{in-line literal text}}}}|}] Blocks of verbatim text (which will typically be rendered to @tt{
} blocks),
can be specified with:
@verbatim{
    {{{
    preformatted text
    }}}
}
The opening @tt|{{{{}|, and its closing partner, must be on
lines by themselves.  The newline after the opening marker, and the
newline before the closing one, are ignored.

Tables look like this:
@verbatim{
  |=Heading Col 1 |=Heading Col 2         |
  |Cell 1.1       |Two lines\\in Cell 1.2 |
  |Cell 2.1       |Cell 2.2               |
}

To this I add syntax:
@itemlist[
@item{@tt{::foo bar baz} :
adds, or replaces, the keyword 'foo' with the string 'bar baz'.}
@item{@tt{"quoted"} :
corresponds to @tt{quoted} (note that's a double-quote
character, not two single quotes).}
@item{@tt{<>} :
adds @tt{content} to the output.}
@item{The @tt{att=value} syntax for @tt{{{}}} is an extension.}]

For an example, the following parses some input text, and writes it
out as XML.

@racketblock[
    (require xml squicky/parser)
    (define (write-xml-to-port wiki-text output-port)
      (write-xml/content
       (xexpr->xml
        `(top (,@(map (lambda (k)
                        (list k (lookup wiki-text k)))
                      (lookup-keys wiki-text)))
              . ,(body wiki-text)))
       output-port)
      (newline output-port))
    (write-xml-to-port (parse (current-input-port))
                                   (current-output-port))
]

Suitable input text would be:
@verbatim{
    ::date 2010 December 12
    == Here is a heading
    Here is some text, with a list comprising:
      * one
      * two.
    
    That's quite //astonishing!//.
}
  
@section{Reference}

Parse an input source with the parse function.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^parse")
@(include-previously-extracted "squicky-extracts.rkt" #rx"^wikitext?")

You can retrieve the body of the parsed text as an xexpr.  The various
creole markup commands are transformed into an HTML-like xexpr, which
can then be processed as desired.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^body")

If there are any keywords in the input text (indicated by 
@tt{::keyword value}),
then these can be retrieved by one of a family of lookup functions:
@(include-previously-extracted "squicky-extracts.rkt" #rx"^lookup.*")

@(include-previously-extracted "squicky-extracts.rkt" #rx"^set-metadata!")