7 Core Scribble Datatypes

The doc binding that a Scribble module exports is a description of a document. Various tools, such as the scribble command-line program, can take this description of a document and render it to a specific format, such as LaTeX or HTML. In particular, Scribble defers detailed typesetting work to LaTeX or to HTML browsers, and Scribble’s plug-in architecture accommodates new rendering back-ends.

Scribble’s documentation abstraction reflects a least-common denominator among such document formats. For example, Scribble has a baked-in notion of itemization, since LaTeX, HTML, and other document formats provide specific support to typeset itemizations. For many other layout tasks, such as formatting Scheme code, Scribble documents fall back to a generic “table” abstraction. Similarly, Scribble itself resolves most forms of cross-references and document dependencies, since different formats provide different levels of automatic support; tables of contents and indexes are mostly built within Scribble, instead of the back-end.

A Scribble document is a program that generates an instance of a part structure type. A part can represent a section or a book, and it can have sub-parts that represent sub-sections or chapters. This paper, for example, is generated by a Scribble document whose resulting part represents the whole paper, and it contains sub-parts for individual sections. The part produced by a Scheme document for a reference manual is rendered as a book, where the immediate sub-parts are chapters.

Figure 2: Scribble’s core document representation

Figure 2 summarizes the structure of a document under part in a UML-like diagram. When a field contains a list, the diagram shows a double arrow, and when a field contains a lists of lists, the diagram shows a triple arrow. The dashed arrows call attention to delayed fields, which are explained below.

Each part has a flow that is typeset before its sub-parts (if any), and that represents the main content of a section. A flow is a list of blocks, where each block is one of the following:
  • a paragraph, which contains content that is typeset inline with automatic line breaks;

  • a table, which contains a list of rows, where each row is a list of blocks, one per cell in the table;

  • an itemization, which contains a list of flows, one per item;

  • a nested-flow, which contains a single flow that is typically typeset with more indentation than its surrounding flow; or

  • a compound-paragraph, which contains a flow that is typeset as a single paragraph;

  • a delayed-block, which eventually expands to another block, using information gathered elsewhere in the document. Accordingly, the block field of a delayed-block is not just a block, but a function that computes a block when given that other information. For example, a delayed-block is used to implement a table of contents.

A Scribble document can construct other kinds of blocks that are implemented in terms of the above built-in kinds. For example, a defproc block that describes a procedure is implemented in terms of a table.

Content within a paragraph can be any of the following:
  • a plain string;

  • an instance of the element structure type, which wraps a list of elements with a typesetting style, such as 'bold, whose detailed interpretation depends on the back-end format;

  • a target-element, which associates a cross-reference tag with a list of elements, and where the typeset elements are the target for cross-references using the tag;

  • a link-element, which associates a cross-reference tag to a list of elements, where the tag designates a cross-reference from the elements to elsewhere in the document (which is rendered in HTML as a hyperlink from the elements);

  • a delayed-element eventually expands to a list of elements. Like a delayed-block, it typically generates the elements using information gathered from elsewhere in the document. A delayed-element often generates a link-element after a suitable target for cross-referencing is located.

  • A collect-element is the complement of delayed-element: it includes an immediate list of elements, but also a procedure to record information that might be used elsewhere in the document. A collect-element often includes a target-element, in which case its procedure might register the target’s cross-reference tag for discovery by delayed-element instances.

  • A few other element types support more specialized tasks, such as communicating between phases and specifying tooltips.

  • A list of content.

A document as represented by a part instance is an immutable value. This value is transformed in several passes to eliminate delayed-block instances, delayed-element instances, and collect-element instances. The result is a simplified part instance and associated cross-reference information. Once the cross-reference information has been computed, it is saved for use when building other documents that have cross-references to this one. Finally, the part instance is consumed by a rendering back-end to produce the final document.

In the current implementation of Scribble, all documents are transformed in only two passes: a collect pass that collects information about the document (e.g., through collect-elements), and a resolve pass that turns delayed blocks and elements into normal elements. We could easily generalize to multiple passes, but so far, two passes have been sufficient within a single document. When multiple documents that refer to each other are built separately, these passes are iterated as explained in Building and Installing Documentation.

In some cases, the output of Scribble needs customization that is specific to a back-end. Users of Scribble provide the customization information by supplying a mapping from the contents of the style field in the various structures for the style’s back-end rendering. For HTML output, a CSS fragment can extend or override the default Scribble style sheet. For LaTeX output, a ".tex" file can extend or redefine the default Scribble LaTeX commands.