The design of (yet another!) pretty-printing library

The design of (yet another!) pretty-printing library
Dave Herman
Northeastern University
June 2005


HISTORY

Functional pretty printers have a long and surprisingly illustrious tradition in
the literature. The ancestry of this library goes something like this:

    1995 - John Hughes publishes a paper [2] on creating an algebra of "pretty
           documents" for the implementation of a pretty-printing library.
    1997 - Simon Peyton Jones implements this as a Haskell library [4].
    1998 - Philip Wadler publishes a paper [5] improving on Hughes' algebra and
           design.
    2001 - Daan Leijen implements this as a Haskell library [3].
    2002 - Ralph Becket ports Leijen's library to Mercury, a strict
           functional/logic language [1].

This library is a translation of the Haskell PPrint library, but with
help from Becket's Mercury implementation for maintaining efficiency
in a strict language.


MERCURY PORT

Becket's port makes the following modifications to the Haskell
library:

- He eliminates the `UNION' constructor, since the only place union is
  really required is for the `group' operation. In a strict language,
  this prevents unnecessary construction of duplicate data.

- He delays the calculation of `best' and `flatten' on the two arms of
  the union.

- He adds a `LABEL' constructor, which allows programmers to specify
  arbitrary text for indentation, rather than just spaces.

Becket further modifies the Haskell algorithm by eliminating the
`SimpleDoc' datatype and directly producing output from within the
layout procedure, rather than first generating the intermediate
SimpleDoc. However, this changes the behavior of the original
algorithm. The layout procedure in the Haskell library examines not
just the current sub-document but its entire context (i.e., the rest
of the document) in order to determine whether it fits on the current
line. The Mercury port, however, only uses the current sub-document to
make this decision.

The following example demonstrates the difference in behavior:

    text "pretty" </> text "printer"

With a column width less than 19 (i.e., length "pretty printer"), the
Haskell library would determine that the flattened document does not
fit, and decide to break lines. The Mercury library, however, only
looks at the soft break and chooses not to break because text " " has
length 1 and therefore fits; it subsequently overruns the length of
the line.


SCHEME PORT

I've chosen a design somewhere in between the two. The code mostly
follows the Haskell version, but I've replaced the `UNION' constructor
with a `GROUP' construct as in Becket's implementation. This way there
is no unnecessary duplication of data. Furthermore, the flattened
version is only computed by need when the layout procedure reaches a
`GROUP' node, and of course the recursion on the non-flattened version
is only computed if the flattened version fails to fit.

I've also added Becket's `LABEL' constructor, which seems like a nice
idea.

[September 26th, 2006]

More recently, I've added a new `MARKUP' constructor, which allows you
to specify a function that transforms a pretty-printed doc into some
other data structure. The transformation is assumed not to affect the
doc's width. This allows inserting markup in X-expression form.

The original `pretty-print' function ignores all markup, but the new
`pretty-markup' function invokes all the markup transformers in the
document.

The markup transformers must all have type:

    (union string (listof a) a) -> (union string (listof a) a)

for some user-defineable type a, which must be consistent within an
entire document.

Example:

    (pretty-markup (markup (lambda (x) `(em ,x)) (text "hello, world!")))
    ; => (em "hello, world!")

[September 27th, 2006]

The previous implementation didn't correctly prune the search space. See
Wadler's paper [5] for the back-story on the following example:

    (define (test-performance n)
      (parameterize ([current-page-width 5])
        (pretty-format (let build-expensive-test-case ([n n])
                         (if (= n 1)
                             (group (v-append (text "hello") (text (number->string n))))
                             (group (v-append (build-expensive-test-case (sub1 n))
                                              (text (number->string n)))))))
        (void)))

This example can arbitrarily nest a bunch of GROUP nodes where the very first
one encountered in the layout algorithm should discover that flattening will
fail (i.e., because "hello" is larger than the page width of 5 characters). In
the past, the layout algorithm would completely compute the layout of the
flattened version before calling `fits?' to discover that it would fail.

In a lazy language, this is optimized for free: the complete recursive call to
layout isn't computed until it's needed, and if `fits?' determines it isn't
needed, it gets short-circuited.

In an eager language, you need to perform the short-circuiting explicitly. I've
added an implementation of backtracking with exceptions in the layout
algorithm. You can test the above example and see that it performs quite well
now.


REFERENCES

[1] Ralph Becket, pprint.m (2002)
    http://www.cs.mu.oz.au/research/mercury/information/doc-latest/mercury_library/pprint.html

[2] John Hughes, "The Design of a Pretty-Printing Library" (1995)
    http://www.cs.chalmers.se/~rjmh/Papers/pretty.html

[3] Daan Leijen, "PPrint, a Prettier Printer" (2001)
    http://www.cs.uu.nl/~daan/download/pprint/pprint.html

[4] Simon Peyton Jones, "A Pretty-Printer Library in Haskell" (1997)
    http://research.microsoft.com/~simonpj/downloads/pretty-printer/pretty.html

[5] Philip Wadler, "A Prettier Printer" (1998)
    http://homepages.inf.ed.ac.uk/wadler/topics/language-design.html#prettier