#lang scribble/doc
@(require scribble/manual
scribble/extract)
@(require (for-label "parser.rkt"))
@(require (for-label racket/base))
@defmodule[squicky/parser]
@title{Squicky: a scheme-based quick wiki parser}
@author+email["Norman Gray"]{http://nxg.me.uk}
This is a Racket-based parser for a wiki syntax based closely on
@hyperlink["http://www.wikicreole.org/"]{WikiCreole},
as described below.
@section{Usage}
The dialect parsed here is the consensus
WikiCreole syntax of @url{http://www.wikicreole.org/}.
It handles all of the WikiCreole
@hyperlink["http://www.wikicreole.org/wiki/Creole1.0TestCases"]{test cases},
except for one test of wiki-internal links (which is in any case somewhat underspecified).
In particular, the supported syntax is
@itemlist[
@item{@tt{//italics//}}
@item{@tt{**bold**} : A line which begins with @tt{**}, with possible whitespace either
side, is a (second-level) bulletted list if the line before it is a bulleted list,
but is a paragraph starting with bold text otherwise.}
@item{@tt{##monospaced text##} : A line which begins with @tt{##}, with possible whitespace either
side, is a (second-level) enumerated list if the line before it is an enumerated list,
but is a paragraph starting with monospace text otherwise. [This is
not specified in the WikiCreole definition, but is clearly compatible
with it].}
@item{@tt{ * bulleted list} : (including sublists, the asterisk may or
may not be indented)}
@item{@tt{ # numbered list} : (including sublists)}
@item{@tt{>quoted paragraph} : including multiple levels (this appears
to be an extension of WikiCreole).}
@item{@tt{[[link to wikipage]]}}
@item{@tt{[[URL|description]]}}
@item{@tt{{{image.png}}} or @tt{{{image.png|alt text}}} or @tt{{{image.png|att=value;att2=value; or more}}}. In the last case, the @tt{att} indicates any attribute on the HTML @tt{} element, such as @tt{class}; the @tt{att} must immediately follow the semicolon (so the last case parses as @tt{att2='value; or more'}); and if the @tt{att} is omitted, it defaults to @tt{alt}.}
@item{@tt{== heading}}
@item{@tt{=== subheading}}
@item{@tt{==== subsubheading}}
@item{@tt{line\\break}}
@item{@tt{----} : (four dashes in a row, on a line by themselves) horizontal list}
@item{@tt{~e}scaped character, and @tt{~http://url} which isn't linked}
@item{@verbatim|{{{{in-line literal text}}}}|}]
Blocks of verbatim text
(which will typically be rendered to @tt{
} blocks), can be specified with: @verbatim{ {{{ preformatted text }}} } The opening @tt|{{{{}|, and its closing partner, must be on lines by themselves. The newline after the opening marker, and the newline before the closing one, are ignored. Tables look like this: @verbatim{ |=Heading Col 1 |=Heading Col 2 | |Cell 1.1 |Two lines\\in Cell 1.2 | |Cell 2.1 |Cell 2.2 | } To this I add syntax: @itemlist[ @item{@tt{::foo bar baz} : adds, or replaces, the keyword 'foo' with the string 'bar baz'.} @item{@tt{"quoted"} : corresponds to @tt{quoted} (note that's a double-quote character, not two single quotes).} @item{@tt{<>} : adds @tt{ content } to the output.} @item{The @tt{att=value} syntax for @tt{{{}}} is an extension.}] For an example, the following parses some input text, and writes it out as XML. @racketblock[ (require xml squicky/parser) (define (write-xml-to-port wiki-text output-port) (write-xml/content (xexpr->xml `(top (,@(map (lambda (k) (list k (lookup wiki-text k))) (lookup-keys wiki-text))) . ,(body wiki-text))) output-port) (newline output-port)) (write-xml-to-port (parse (current-input-port)) (current-output-port)) ] Suitable input text would be: @verbatim{ ::date 2010 December 12 == Here is a heading Here is some text, with a list comprising: * one * two. That's quite //astonishing!//. } @section{Reference} Parse an input source with the parse function. @(include-previously-extracted "squicky-extracts.rkt" #rx"^parse") @(include-previously-extracted "squicky-extracts.rkt" #rx"^wikitext?") You can retrieve the body of the parsed text as an xexpr. The various creole markup commands are transformed into an HTML-like xexpr, which can then be processed as desired. @(include-previously-extracted "squicky-extracts.rkt" #rx"^body") If there are any keywords in the input text (indicated by @tt{::keyword value}), then these can be retrieved by one of a family of lookup functions: @(include-previously-extracted "squicky-extracts.rkt" #rx"^lookup.*") @(include-previously-extracted "squicky-extracts.rkt" #rx"^set-metadata!")