#lang scribble/doc
@(require scribble/manual
scribble/extract)
@(require (for-label "parser.rkt"))
@(require (for-label racket/base))
@defmodule[squicky/parser]
@title{Squicky: a scheme-based quick wiki parser}
@author+email["Norman Gray"]{http://nxg.me.uk}
This is a Racket-based parser for a wiki syntax based closely on
@hyperlink["http://www.wikicreole.org/"]{WikiCreole},
as described below.
@section{Usage}
The dialect parsed here is the consensus
WikiCreole syntax of @url{http://www.wikicreole.org/}.
It handles all of the WikiCreole
@hyperlink["http://www.wikicreole.org/wiki/Creole1.0TestCases"]{test cases},
except for one test of wiki-internal links (which is in any case somewhat underspecified).
In particular, the supported syntax is
@itemlist[
@item{@tt{//italics//}}
@item{@tt{**bold**} : A line which begins with @tt{**}, with possible whitespace either
side, is a (second-level) bulletted list if the line before it is a bulleted list,
but is a paragraph starting with bold text otherwise.}
@item{@tt{##monospaced text##} : A line which begins with @tt{##}, with possible whitespace either
side, is a (second-level) enumerated list if the line before it is an enumerated list,
but is a paragraph starting with monospace text otherwise. [This is
not specified in the WikiCreole definition, but is clearly compatible
with it].}
@item{@tt{ * bulleted list} : (including sublists, the asterisk may or
may not be indented)}
@item{@tt{ # numbered list} : (including sublists)}
@item{@tt{>quoted paragraph} : including multiple levels (this appears
to be an extension of WikiCreole).}
@item{@tt{[[link to wikipage]]}}
@item{@tt{[[URL|description]]}}
@item{@tt{{{image.png}}} or @tt{{{image.png|alt text}}} or @tt{{{image.png|att=value;att2=value; or more}}}. In the last case, the @tt{att} indicates any attribute on the HTML @tt{} element, such as @tt{class}; the @tt{att} must immediately follow the semicolon (so the last case parses as @tt{att2='value; or more'}); and if the @tt{att} is omitted, it defaults to @tt{alt}.}
@item{@tt{== heading}}
@item{@tt{=== subheading}}
@item{@tt{==== subsubheading}}
@item{@tt{line\\break}}
@item{@tt{----} : (four dashes in a row, on a line by themselves) horizontal list}
@item{@tt{~e}scaped character, and @tt{~http://url} which isn't linked}
@item{@verbatim|{{{{in-line literal text}}}}|}]
Blocks of verbatim text
(which will typically be rendered to @tt{
} blocks),
can be specified with:
@verbatim{
{{{
preformatted text
}}}
}
The opening @tt|{{{{}|, and its closing partner, must be on
lines by themselves. The newline after the opening marker, and the
newline before the closing one, are ignored.
Tables look like this:
@verbatim{
|=Heading Col 1 |=Heading Col 2 |
|Cell 1.1 |Two lines\\in Cell 1.2 |
|Cell 2.1 |Cell 2.2 |
}
To this I add syntax:
@itemlist[
@item{@tt{::foo bar baz} :
adds, or replaces, the keyword 'foo' with the string 'bar baz'.}
@item{@tt{"quoted"} :
corresponds to @tt{quoted
} (note that's a double-quote
character, not two single quotes).}
@item{@tt{<>} :
adds @tt{content } to the output.}
@item{The @tt{att=value} syntax for @tt{{{}}} is an extension.}]
For an example, the following parses some input text, and writes it
out as XML.
@racketblock[
(require xml squicky/parser)
(define (write-xml-to-port wiki-text output-port)
(write-xml/content
(xexpr->xml
`(top (,@(map (lambda (k)
(list k (lookup wiki-text k)))
(lookup-keys wiki-text)))
. ,(body wiki-text)))
output-port)
(newline output-port))
(write-xml-to-port (parse (current-input-port))
(current-output-port))
]
Suitable input text would be:
@verbatim{
::date 2010 December 12
== Here is a heading
Here is some text, with a list comprising:
* one
* two.
That's quite //astonishing!//.
}
@section{Reference}
Parse an input source with the parse function.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^parse")
@(include-previously-extracted "squicky-extracts.rkt" #rx"^wikitext?")
You can retrieve the body of the parsed text as an xexpr. The various
creole markup commands are transformed into an HTML-like xexpr, which
can then be processed as desired.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^body")
If there are any keywords in the input text (indicated by
@tt{::keyword value}),
then these can be retrieved by one of a family of lookup functions:
@(include-previously-extracted "squicky-extracts.rkt" #rx"^lookup.*")
@(include-previously-extracted "squicky-extracts.rkt" #rx"^set-metadata!")