#lang scribble/doc

@(require scribble/manual
          scribble/extract)
@(require (for-label "parser.rkt"))
@(require (for-label racket/base))

@defmodule[squicky/parser]

@title{Squicky: a scheme-based quick wiki parser}
@author+email["Norman Gray"]{http://nxg.me.uk}

This is a Racket-based parser for a wiki syntax based closely on
@hyperlink["http://www.wikicreole.org/"]{WikiCreole},
as described below. 

@section{Usage}

The dialect parsed here is the consensus
WikiCreole syntax of @url{http://www.wikicreole.org/}.
It handles all of the WikiCreole
@hyperlink["http://www.wikicreole.org/wiki/Creole1.0TestCases"]{test cases},
except for one test of wiki-internal links (which is in any case somewhat underspecified).

In particular, the supported syntax is
@itemlist[
@item{@tt{//italics//}}
@item{@tt{**bold**} : A line which begins with @tt{**}, with possible whitespace either
side, is a (second-level) bulletted list if the line before it is a bulleted list,
but is a paragraph starting with bold text otherwise.}
@item{@tt{##monospaced text##} : A line which begins with @tt{##}, with possible whitespace either
side, is a (second-level) enumerated list if the line before it is an enumerated list,
but is a paragraph starting with monospace text otherwise. [This is
not specified in the WikiCreole definition, but is clearly compatible
with it].}
@item{@tt{ * bulleted list} : (including sublists, the asterisk may or
may not be indented)}
@item{@tt{ # numbered list} : (including sublists)}
@item{@tt{>quoted paragraph} : including multiple levels (this appears
to be an extension of WikiCreole).}
@item{@tt{[[link to wikipage]]}}
@item{@tt{[[URL|description]]}}
@item{@tt{{{image.png}}} or @tt{{{image.png|alt text}}} or @tt{{{image.png|att=value;att2=value; or more}}}.  In the last case, the @tt{att} indicates any attribute on the HTML @tt{<img>} element, such as @tt{class}; the @tt{att} must immediately follow the semicolon (so the last case parses as @tt{att2='value; or more'}); and if the @tt{att} is omitted, it defaults to @tt{alt}.}
@item{@tt{== heading}}
@item{@tt{=== subheading}}
@item{@tt{==== subsubheading}}
@item{@tt{line\\break}}
@item{@tt{----} : (four dashes in a row, on a line by themselves) horizontal list}
@item{@tt{~e}scaped character, and @tt{~http://url} which isn't linked}

@item{@verbatim|{{{{in-line literal text}}}}|}]

Blocks of verbatim text
(which will typically be rendered to @tt{<pre>} blocks),
can be specified with:
@verbatim{
    {{{
    preformatted text
    }}}
}
The opening @tt|{{{{}|, and its closing partner, must be on
lines by themselves.  The newline after the opening marker, and the
newline before the closing one, are ignored.

Tables look like this:
@verbatim{
  |=Heading Col 1 |=Heading Col 2         |
  |Cell 1.1       |Two lines\\in Cell 1.2 |
  |Cell 2.1       |Cell 2.2               |
}

To this I add syntax:
@itemlist[
@item{@tt{::foo bar baz} :
adds, or replaces, the keyword 'foo' with the string 'bar baz'.}
@item{@tt{"quoted"} :
corresponds to @tt{<q>quoted</q>} (note that's a double-quote
character, not two single quotes).}
@item{@tt{<<element-name   content>>} :
adds @tt{<element-name>content</element-name>} to the output.}
@item{The @tt{att=value} syntax for @tt{{{}}} is an extension.}]

For an example, the following parses some input text, and writes it
out as XML.

@racketblock[
    (require xml squicky/parser)
    (define (write-xml-to-port wiki-text output-port)
      (write-xml/content
       (xexpr->xml
        `(top (,@(map (lambda (k)
                        (list k (lookup wiki-text k)))
                      (lookup-keys wiki-text)))
              . ,(body wiki-text)))
       output-port)
      (newline output-port))
    (write-xml-to-port (parse (current-input-port))
                                   (current-output-port))
]

Suitable input text would be:
@verbatim{
    ::date 2010 December 12
    == Here is a heading
    Here is some text, with a list comprising:
      * one
      * two.
    
    That's quite //astonishing!//.
}
  
@section{Reference}

Parse an input source with the parse function.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^parse")
@(include-previously-extracted "squicky-extracts.rkt" #rx"^wikitext?")

You can retrieve the body of the parsed text as an xexpr.  The various
creole markup commands are transformed into an HTML-like xexpr, which
can then be processed as desired.
@(include-previously-extracted "squicky-extracts.rkt" #rx"^body")

If there are any keywords in the input text (indicated by 
@tt{::keyword value}),
then these can be retrieved by one of a family of lookup functions:
@(include-previously-extracted "squicky-extracts.rkt" #rx"^lookup.*")

@(include-previously-extracted "squicky-extracts.rkt" #rx"^set-metadata!")