BLah italic bold ened
still < bold
But not done yet...")
==>
(*TOP* (html (head (title) (title "whatever"))
(body "\n"
(a (\@ (href "url")) "link")
(p (\@ (align "center"))
(ul (\@ (compact) (style "aa")) "\n"))
(p "BLah"
(*COMMENT* " comment ")
" "
(i " italic " (b " bold " (tt " ened")))
"\n"
"still < bold "))
(p " But not done yet...")))
]
Note that in the emitted SHTML the text token @tt{"still < bold"} is @emph{not} inside the @tt{b} element, which represents an unfortunate failure to emulate all the quirks-handling behavior of some popular Web browsers.
The procedures @tt{html->sxml-@schemevarfont{n}nf} for @schemevarfont{n} 0 through 2 correspond to 0th through 2nd normal forms of SXML as specified in SXML, and indicate the minimal requirements of the emitted SXML.
@tt{html->sxml} and @tt{html->shtml} are currently aliases for @tt{html->sxml-0nf}, and can be used in scripts and interactively, when terseness is important and any normal form of SXML would suffice.
}
@section{Emitting HTML}
Two procedures encoding the SHTML representation as conventional HTML, @tt{write-shtml-as-html} and @tt{shtml->html}. These are perhaps most useful for emitting the result of parsed and transformed input HTML. They can also be used for emitting HTML from generated or handwritten SHTML.
@defproc[ (write-shtml-as-html (shtml any/c) (out any/c) (foreign-filter any/c)) any/c]{
Writes a conventional HTML transliteration of the SHTML @schemevarfont{shtml} to output port @schemevarfont{out}. If @schemevarfont{out} is not specified, the default is the current output port. HTML elements of types that are always empty are written using HTML4-compatible XHTML tag syntax.
If @schemevarfont{foreign-filter} is specified, it is a procedure of two argument that is applied to any non-SHTML (``foreign'') object encountered in @schemevarfont{shtml}, and should yield SHTML. The first argument is the object, and the second argument is a boolean for whether or not the object is part of an attribute value.
No inter-tag whitespace or line breaks not explicit in @schemevarfont{shtml} is emitted. The @schemevarfont{shtml} should normally include a newline at the end of the document. For example:
@SCHEMEBLOCK[
(write-shtml-as-html
'((html (head (title "My Title"))
(body (\@ (bgcolor "white"))
(h1 "My Heading")
(p "This is a paragraph.")
(p "This is another paragraph.")))))
]
outputs:
@verbatim["My TitleMy Heading
This is a paragraph.
This is\n another paragraph.
"]
}
@defproc[ (shtml->html (shtml any/c)) any/c]{
Yields an HTML encoding of SHTML @schemevarfont{shtml} as a string. For example:
@SCHEMEBLOCK[
(shtml->html
(html->shtml
"This is
bold italic b > text.
"))
==> "This is
bold italic text.
"
]
Note that, since this procedure constructs a string, it should normally only be used when the HTML is relatively small. When encoding HTML documents of conventional size and larger, @tt{write-shtml-as-html} is much more efficient.
}
@section{History}
@itemize[
@item{Version 0.20 --- 2011-08-22 --- PLaneT @tt{(1 7)}
Document that HtmlPrag has been obsoleted.
}
@item{Version 0.19 --- 2009-11-08 --- PLaneT @tt{(1 6)}
Whitespace after a @tt{<} in a context that would otherwise start a tag is no longer considered the start of a tag. This behavior is consistent with, e.g., Firefox 3, and we have found a major site relying on it. Three regression tests were changed to match the new desired behavior.
}
@item{Version 0.18 --- 2009-11-07 --- PLaneT @tt{(1 5)}
The @tt{p} element can be a child of the @tt{li} element.
}
@item{Version 0.17 --- 2009-08-16 --- PLaneT @tt{(1 4)}
License is now LGPL3. Converted to author's new Scheme management system. Revamped high-level parser to not use mutable pairs, for PLT Scheme 4.x compatibility. Until the new portability mechanism is in place, the previous portable version of HtmlPrag is available at: @link["http://www.neilvandyke.org/htmlprag/htmlprag-0-16.scm"]{http://www.neilvandyke.org/htmlprag/htmlprag-0-16.scm}
}
@item{Version 0.16 --- 2005-12-18
Documentation fix.
}
@item{Version 0.15 --- 2005-12-18
In the HTML parent element constraints that are used for structure recovery, @tt{div} is now always permitted as a parent, as a stopgap measure until substantial time can be spent reworking the algorithm to better support @tt{div} (bug reported by Corey Sweeney and Jepri). Also no longer convert to Scheme character any HTML numeric character reference with value above 126, to avoid Unicode problem with PLT 299/300 (bug reported by Corey Sweeney).
}
@item{Version 0.14 --- 2005-06-16
XML CDATA sections are now tokenized. Thanks to Alejandro Forero Cuervo for suggesting this feature. The deprecated procedures @tt{sxml->html} and @tt{write-sxml-html} have been removed. Minor documentation changes.
}
@item{Version 0.13 --- 2005-02-23
HtmlPrag now requires @tt{syntax-rules}, and a reader that can read the at-sign character as a symbol. SHTML now has a special @tt{&} element for character entities, and it is emitted by the parser rather than the old @tt{*ENTITY*} kludge. @tt{shtml-entity-value} supports both the new and the old character entity representations. @tt{shtml-entity-value} now yields @tt{#f} on invalid SHTML entity, rather than raising an error. @tt{write-shtml-as-html} now has a third argument, @tt{foreign-filter}. @tt{write-shtml-as-html} now emits SHTML @tt{&} entity references. Changed @tt{shtml-named-char-id} and @tt{shtml-numeric-char-id}, as previously warned. Testeez is now used for the test suite. Test procedure is now the internal @tt{%htmlprag:test}. Documentation changes. Notably, much documentation about using HtmlPrag under various particular Scheme implementations has been removed.
}
@item{Version 0.12 --- 2004-07-12
Forward-slash in an unquoted attribute value is now considered a value constituent rather than an unconsumed terminator of the value (thanks to Maurice Davis for reporting and a suggested fix). @tt{xml:} is now preserved as a namespace qualifier (thanks to Peter Barabas for reporting). Output port term of @tt{write-shtml-as-html} is now optional. Began documenting loading for particular implementation-specific packagings.
}
@item{Version 0.11 --- 2004-05-13
To reduce likely namespace collisions with SXML tools, and in anticipation of a forthcoming set of new features, introduced the concept of ``SHTML,'' which will be elaborated upon in a future version of HtmlPrag. Renamed @tt{sxml-@schemevarfont{x}-symbol} to @tt{shtml-@schemevarfont{x}-symbol}, @tt{sxml-html-@schemevarfont{x}} to @tt{shtml-@schemevarfont{x}}, and @tt{sxml-token-kind} to @tt{shtml-token-kind}. @tt{html->shtml}, @tt{shtml->html}, and @tt{write-shtml-as-html} have been added as names. Considered deprecated but still defined (see the ``Deprecated'' section of this documentation) are @tt{sxml->html} and @tt{write-sxml-html}. The growing pains should now be all but over. Internally, @tt{htmlprag-internal:error} introduced for Bigloo portability. SISC returned to the test list; thanks to Scott G. Miller for his help. Fixed a new character @tt{eq?} bug, thanks to SISC.
}
@item{Version 0.10 --- 2004-05-11
All public identifiers have been renamed to drop the ``@tt{htmlprag:}'' prefix. The portability identifiers have been renamed to begin with an @tt{htmlprag-internal:} prefix, are now considered strictly internal-use-only, and have otherwise been changed. @tt{parse-html} and @tt{always-empty-html-elements} are no longer public. @tt{test-htmlprag} now tests @tt{html->sxml} rather than @tt{parse-html}. SISC temporarily removed from the test list, until an open source Java that works correctly is found.
}
@item{Version 0.9 --- 2004-05-07
HTML encoding procedures added. Added @tt{htmlprag:sxml-html-entity-value}. Upper-case @tt{X} in hexadecimal character entities is now parsed, in addition to lower-case @tt{x}. Added @tt{htmlprag:always-empty-html-elements}. Added additional portability bindings. Added more test cases.
}
@item{Version 0.8 --- 2004-04-27
Entity references (symbolic, decimal numeric, hexadecimal numeric) are now parsed into @tt{*ENTITY*} SXML. SXML symbols like @tt{*TOP*} are now always upper-case, regardless of the Scheme implementation. Identifiers such as @tt{htmlprag:sxml-top-symbol} are bound to the upper-case symbols. Procedures @tt{htmlprag:html->sxml-0nf}, @tt{htmlprag:html->sxml-1nf}, and @tt{htmlprag:html->sxml-2nf} have been added. @tt{htmlprag:html->sxml} now an alias for @tt{htmlprag:html->sxml-0nf}. @tt{htmlprag:parse} has been refashioned as @tt{htmlprag:parse-html} and should no longer be directly. A number of identifiers have been renamed to be more appropriate when the @tt{htmlprag:} prefix is dropped in some implementation-specific packagings of HtmlPrag: @tt{htmlprag:make-tokenizer} to @tt{htmlprag:make-html-tokenizer}, @tt{htmlprag:parse/tokenizer} to @tt{htmlprag:parse-html/tokenizer}, @tt{htmlprag:html->token-list} to @tt{htmlprag:tokenize-html}, @tt{htmlprag:token-kind} to @tt{htmlprag:sxml-token-kind}, and @tt{htmlprag:test} to @tt{htmlprag:test-htmlprag}. Verbatim elements with empty-element tag syntax are handled correctly. New versions of Bigloo and RScheme tested.
}
@item{Version 0.7 --- 2004-03-10
Verbatim pair elements like @tt{script} and @tt{xmp} are now parsed correctly. Two Scheme implementations have temporarily been dropped from regression testing: Kawa, due to a Java bytecode verifier error likely due to a Java installation problem on the test machine; and SXM 1.1, due to hitting a limit on the number of literals late in the test suite code. Tested newer versions of Bigloo, Chicken, Gauche, Guile, MIT Scheme, PLT MzScheme, RScheme, SISC, and STklos. RScheme no longer requires the ``@tt{(define get-output-string close-output-port)}'' workaround.
}
@item{Version 0.6 --- 2003-07-03
Fixed uses of @tt{eq?} in character comparisons, thanks to Scott G. Miller. Added @tt{htmlprag:html->normalized-sxml} and @tt{htmlprag:html->nonnormalized-sxml}. Started to add @tt{close-output-port} to uses of output strings, then reverted due to bug in one of the supported dialects. Tested newer versions of Bigloo, Gauche, PLT MzScheme, RScheme.
}
@item{Version 0.5 --- 2003-02-26
Removed uses of @tt{call-with-values}. Re-ordered top-level definitions, for portability. Now tests under Kawa 1.6.99, RScheme 0.7.3.2, Scheme 48 0.57, SISC 1.7.4, STklos 0.54, and SXM 1.1.
}
@item{Version 0.4 --- 2003-02-19
Apostrophe-quoted element attribute values are now handled. A bug that incorrectly assumed left-to-right term evaluation order has been fixed (thanks to MIT Scheme for confronting us with this). Now also tests OK under Gauche 0.6.6 and MIT Scheme 7.7.1. Portability improvement for implementations (e.g., RScheme 0.7.3.2.b6, Stalin 0.9) that cannot read the at-sign character as a symbol (although those implementations tend to present other portability issues, as yet unresolved).
}
@item{Version 0.3 --- 2003-02-05
A test suite with 66 cases has been added, and necessary changes have been made for the suite to pass on five popular Scheme implementations. XML processing instructions are now parsed. Parent constraints have been added for @tt{colgroup}, @tt{tbody}, and @tt{thead} elements. Erroneous input, including invalid hexadecimal entity reference syntax and extraneous double quotes in element tags, is now parsed better. @tt{htmlprag:token-kind} emits symbols more consistent with SXML.
}
@item{Version 0.2 --- 2003-02-02
Portability improvements.
}
@item{Version 0.1 --- 2003-01-31
Dusted off author's old Guile-specific code from April 2001, converted to emit SXML, mostly ported to R5RS and SRFI-6, added some XHTML support and documentation. A little preliminary testing has been done, and the package is already useful for some applications, but this release should be considered a preview to invite comments.
}
]
@section[#:tag "Legal"]{Legal}
Copyright (c) 2003 -- 2011 Neil Van Dyke. This program is Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.
@italic{@smaller{Standard Documentation Format Note: The API
signatures in this documentation are likely incorrect in some regards, such as
indicating type @tt{any/c} for things that are not, and not indicating when
arguments are optional. This is due to a transitioning from the Texinfo
documentation format to Scribble, which the author intends to finish
someday.}}