* foof-loop: A Simple, Extensible Scheme Looping Facility -*-outline-*-
* foof-loop: A Simple, Extensible Scheme Looping Facility -*-outline-*-
This is the reference manual for foof-loop version 7. This file was
written by Taylor R. Campbell and is placed in the Public Domain. All
warranties are disclaimed.
The most recently released version of this file is permanently stored
at <http://mumble.net/~campbell/scheme/foof-loop.txt>.
** Installation
To load foof-loop into your favourite Scheme system, simply load the
following files, in this order:
http://mumble.net/~campbell/scheme/let-values.scm
http://mumble.net/~campbell/scheme/syn-param.scm
http://mumble.net/~campbell/scheme/foof-loop.scm
See the section titled `Porting to New Schemes' below for more
details on the implementation.
** Introduction
Looping is one of the most fundamental concepts of computer programs,
and it is not a simple concept that can be described succinctly once
and for all programs; therefore, it is in our interest to have some way
to write loops clearly. Although there are far too many kinds of loops
-- perhaps an infinite number -- for any one framework to give a name
to each conceivable way to express them, we should like to have a
cohesive framework for expressing a large subset of the common kinds of
loops in a flexible but simple way.
*** Basic Looping: Named LET versus DO versus LOOP
Let us begin with a simple example: a loop that counts the number of
elements satisfying a certain predicate in a sequence. What sort of
sequence? We shall try lists first. In standard Scheme, this might be
written with named LET or DO; let us choose DO, simply because it is
slightly more concise for this form of loop:
(define (count-matching-items list predicate)
(do ((list list (cdr list))
(count 0
(if (predicate (car list))
(+ count 1)
count)))
((null-list? list) count)))
This is fairly straightforward, but if we wish to substitute a vector
for a list, we must alter the structure of the loop so that it steps
through the indices of the vector, rather than through the list's
structure. That is, we must write in precise detail the *means* of
looping:
(define (count-matching-items vector predicate)
(do ((index 0 (+ index 1))
(count 0
(if (predicate (vector-ref vector index))
(+ count 1)
count)))
((= index (vector-length vector))
count)))
Even worse, if we wish to count the matching items read from an input
port, we must restructure the whole loop using named LET, because we
need to perform an action when the loop is entered *before* the
termination condition:
(define (count-matching-items reader input-port predicate)
(let continue ((count 0))
(let ((item (reader input-port)))
(if (eof-object? item)
count
(continue (if (predicate item)
(+ count 1)
count))))))
We should like a notation that allows the programmer to specify *what*
iteration, not *how* to iterate. We want to give the programmer the
tools to request iteration over each element in a list, or each element
in a vector, or each item read from an input port. LOOP lets us do so:
(define (count-matching-items list predicate)
(loop ((for item (in-list list))
(with count 0
(if (predicate item)
(+ count 1)
count)))
=> count))
How do we read this mountain of syntactic sugar? The first part is
LOOP, which is the name of the macro that invokes the whole construct.
The second part is a list of iteration clauses, each of which acts in
parallel through the whole iteration; that is, there are no nested
loops here. (FOR ITEM (IN-LIST LIST)) asks for ITEM to be bound to
each element of the list LIST; (WITH COUNT 0 <update>) asks COUNT to be
a variable initialized to 0 and updated at each iteration to the value
of <update>. Finally, the return value of the loop, once any of the
iterators can go no further (in this case, once we have run through all
elements of LIST), is specified with the `=>' token followed by the
expression to return -- here, the number of matching items we counted.
We desired a way to write *what* iteration, and not *how*, and LOOP
lets us do so. If what we want is to iterate over each element of a
vector, we need simply substitute IN-VECTOR for IN-LIST. If what we
want is to iterate for each item read from an input port, we need
simply substitute IN-PORT.
(This code still must specify information about what type of sequence
we are iterating through -- IN-VECTOR vs IN-LIST &c. --, and we might
wish to elide that as well and substitute IN-COLLECTION or IN-SEQUENCE,
or even IN-ASSOCIATION for alists & hash tables &c., but we do not
provide any built-in mechanism for doing so. The author has, however,
designed a generic collection iteration protocol, based on Dylan's, for
which there are such LOOP iterators as IN-COLLECTION &c. They fall
outside the scope of this introduction, however.)
*** Explicit Control of Iteration
An astute reader might wonder why we need a `=>' token to indicate the
final value returned by the loop; a differently astute reader might
wonder how LOOP can possibly be as general as DO or named LET without
any program body inside the loop. We can, in fact, write a program
body, and distinguishing the final expression from the body is
precisely the reason for the `=>' token. Indeed, not only can we write
an arbitrary program body as in DO, but we can even give a name to the
loop and specify where to continue to the next iteration as we please.
For example, if we wished to *find* a matching item, rather than to
count all of them, we could write a procedure to do so thus:
(define (find-matching-item list if-absent predicate)
(loop continue ((for item (in-list list)))
=> (if-absent)
(if (predicate item)
item
(continue))))
In this example, the loop continues only if ITEM fails to satisfy
PREDICATE. Only if we have run through every element of LIST, and
found none satisfying PREDICATE, shall we evaluate the final expression
and call IF-ABSENT. Again, we can always substitute IN-VECTOR,
IN-PORT, or any other iterator as we please, for IN-LIST here.
In fact, not only can we give a name to the loop and invoke it
explicitly to continue iteration, but we are not limited to calling it
in tail positions in the loop body. We could, for example, write a
(single-list) MAP procedure thus:
(define (map procedure list)
(loop recur ((for element (in-list list)))
=> '()
(cons (procedure element) (recur))))
If there are no more elements in LIST, then the loop yields the empty
list; otherwise, it recursively processes the remaining elements of the
list, and constructs a pair of the mapped ELEMENT and all the mapped
remaining elements.
Finally, we need not supply a final expression at all, in which case
what the loop returns by default is unspecified. For example, to write
each element of a list LIST followed by a newline, we could use the
following program:
(loop ((for element (in-list list)))
(write element)
(newline))
*** Updating Loop Variables
A third sort of astute reader may wonder still how LOOP can be as
expressive and flexible as named LET, which allows for the updates of
loop variables (which we introduced in LOOP using WITH clauses) when
the loop procedure is called. In fact, LOOP is a drop-in replacement
for named LET, provided that we use the loop procedure only in operator
positions, because it is not actually a procedure but a macro, for
reasons that will become apparent shortly. The following procedure's
meaning is invariant whether we write LOOP or LET:
(define (reverse-map procedure list)
(let continue ((list list) (result '()))
(if (null-list? list)
result
(continue (cdr list)
(cons (procedure (car list) result))))))
As loop clauses, (<variable> <initializer>) and (WITH <variable>
<initializer>) are both equivalent to named LET bindings, and we can
update the variables by passing new values to the named loop. As with
named LET, the update is purely functional; that is, it works by
rebinding, *not* by assignment.
But one of the primary gripes one hears with named LET is the distance
between the list of loop variables and the expressions by which they
are updated. The matter is not helped by the only association between
the two: position. In LOOP, on the other hand, we may choose whichever
of positional and named update we please. For example, if we wish to
partition a list by a predicate by adding an element to one of two
resultant lists, and leaving the other untouched, we might write this
(following SRFI 1's specification of the procedure):
(define (partition predicate list)
(loop continue ((for element (in-list list))
(with satisfied '())
(with unsatisfied '()))
=> (values (reverse satisfied)
(reverse unsatisfied))
(if (predicate element)
(continue (=> satisfied (cons element satisfied)))
(continue (=> unsatisfied (cons element unsatisfied))))))
*** Iterator-Supplied Loop Variables
Sometimes it is not enough just to step through each element of a list
or a vector; we may need to find where in the sequence we are, or move
to another position. Some iterators, therefore, can expose the
underlying loop variables, specific to each iterator. IN-LIST exposes
the pair whose car is the current element of the list. It also ensures
that modifying the cdr of that pair is safe and does not affect the
iteration, so we can reverse a list destructively like so:
(define (reverse! list)
(loop ((for element pair (in-list list))
(with tail '() pair))
=> tail
(set-cdr! pair tail)))
We step through the list, letting TAIL be () initially and then the
last pair we encountered; for each pair we encounter, we set its cdr to
be the current tail. We happen not to use ELEMENT here, although we
could easily turn this into a REVERSE-MAP! procedure like so:
(define (reverse-map! procedure list)
(loop ((for element pair (in-list list))
(with tail '() pair))
=> tail
(set-car! pair (procedure element))
(set-cdr! pair tail)))
Often, in processing certain sorts of lists, we find nested sublists
which we want to coalesce into the containing list; Scheme's BEGIN
works like this. We can flatten as we go along by updating the loop
variable of IN-LIST explicitly, rather than simply letting it to be
implicitly updated to the cdr of the current pair:
(define (flatten-begins list)
(loop continue ((for element pair (in-list list))
(with subforms '()))
=> (reverse subforms)
(if (and (pair? element)
(eq? 'BEGIN (car element)))
(continue (=> pair (append (cdr element) (cdr pair))))
(continue (=> subforms (cons element subforms))))))
In the case of vector iterators, the underlying loop variable is the
index into the vector. For instance, to shuffle the contents of a
vector by the Fisher-Yates method, given a procedure (RANDOM-INTEGER
<exclusive upper bound>), we might write this:
(define (shuffle-vector! vector)
(loop ((for element i (in-vector vector)))
(let ((j (random-integer (+ i 1))))
(vector-set! vector i (vector-ref vector j))
(vector-set! vector j element))))
*** Numeric Iteration
There are two iterators supplied for iterating over numbers: UP-FROM
and DOWN-FROM. (These violate the convention of the prefix of `IN-' on
iterator names, but after long and arduous discussion of the matter,
the author and his minions concluded that these are better than any of
the alternatives that were considered.) For example, to construct a
list of a given length that contains the result of applying a given
procedure to each respective index in the list, we could write this:
(define (list-tabulate length procedure)
(loop ((for index (up-from 0 (to length)))
(with list '() (cons (procedure index) list)))
=> (reverse list)))
NOTE: The keyword TO is part of the syntax of UP-FROM which specifies
an upper bound for the iteration. It is *not* a separate procedure or
special operator.
We can step by a specified increment as well by adding an on argument
(BY <increment>); for instance, to produce a list of the even integers
less than N, we might write this:
(loop ((for integer (up-from 0 (to N) (by 2)))
(with evens '() (cons integer evens)))
=> (reverse evens))
We can also omit TO in order to get an unbounded iteration, which
presumably we intend to terminate by some other means. For example, to
find the length of a list (which we assume to be finite and non-
circular), we might write the following:
(define (unsafe-length list)
(loop ((for element (in-list list))
(for length (up-from 0)))
=> length))
(FOR <variable> (UP-FROM <start> [(TO <end>)] [(BY <step>)])) iterates
with <variable> initially bound to <start> and then updated at each
iteration to <step> added to its current value, until its value has
reached or exceeded <end>. DOWN-FROM works slightly differently,
however, and illustrates an important point in the design of LOOP,
which I shall give its own paragraph to emphasize its significance.
Lower bounds are inclusive. Upper bounds are exclusive.
This means that, while UP-FROM includes the start and excludes the end,
DOWN-FROM, which has the same syntax, excludes the start and includes
the end, if it is reached. (It may be the case that <start> - <end> is
not an even multiple of <step>, in which case <end> will be excluded
anyway.) This bears repeating.
Lower bounds are inclusive. Upper bounds are exclusive.
Whether we are iterating forward or backward, over ranges of numbers or
elements of vectors (with IN-VECTOR-REVERSE, which is just like
IN-VECTOR except that it iterates over the reverse sequence of
elements), the lower bound of the iteration is inclusive and the upper
bound of the iteration is exclusive.
This has an important consequence for the nature of the loop variable
in DOWN-FROM (or IN-VECTOR-REVERSE, or any other iteration of numbers
from positive infinity toward negative infinity): the decrement is
subtracted from the loop variable's value on *entry* to the loop. This
means that if the variable is initialized to 0, its first value in the
loop body will be -1, not 0; and if it is updated to N, the next value
that it will have in the loop body (if it satisfies no termination
condition) will be N - 1.
Let me say it one more time, just to be sure that the reader
understands.
Lower bounds are inclusive. Upper bounds are exclusive.
The reason for this design decision is that reverse iteration is
extremely counter-intuitive, and any inconsistency in convention is
sure to be an eternal source of off-by-one errors. This is not to say
that off-by-one errors are impossible, but that they are much easier to
accidentally stumble upon in reverse iteration. The use of inclusive
lower bounds and exclusive upper bounds is already very much
established in Scheme, and it has some convenient properties for many
programs.
Where one must use any other sort of bounds, because such bounds are
the exception rather than the rule, programmers should draw attention
to the fact, because this is so easy to mix up inadvertently. If
possible, use inclusive lower bounds and exclusive upper bounds, but if
it is truly necessary to use any other class of bounds, they should be
expressed explicitly using WITH, LET, and UNTIL clauses. For example,
to loop with inclusive bounds on both ends, write
(loop (...
(with i start (+ i step))
(until (> i stop))
...)
...).
*** Accumulating Results
At this point, the only iterators we have seen are those that step
through sequences. Every time we have accumulated, say, a list of
values as the result of a loop, we have always had to write it with a
manual loop variable, a manual CONS, and a manual REVERSE at the end.
We can simplify this by using accumulating iterators such as LISTING.
If we use LISTING, (single-list) MAP becomes:
(define (map procedure list)
(loop ((for element (in-list list))
(for result (listing (procedure element))))
=> result))
Accumulators, as the reader has probably noticed, are not named
according to the `IN-' convention for iterators. Instead, they are
named with an `-ING' suffix. Furthermore, most of them follow the same
basic structure:
(FOR <accumulated-result>
(<accumulating> [(INITIAL <initial value>)]
<datum or conditions>))
<Initial value> is something to seed the accumulation with. In
LISTING, for example, it is the tail of the list after all the
accumulated elements.
There are several ways to express the accumulation of a datum. The
simplest is with an expression whose value is always accumulated; we
showed that above in the most recent definition of MAP. We may also
accumulate based on the value of a condition; for example, to filter
from a list only those items that satisfy a predicate, we might write
the following:
(define (filter predicate list)
(loop ((for element (in-list list))
(for result
(listing element
(if (predicate element)))))
=> result))
Sometimes the value of the condition is also needed to compute the
datum to accumulate, so we may also express the accumulation with a
notation much like COND's `=>' clause, and accumulate a list of all
non-false values yielded by a procedure applied to each element of a
list:
(define (filter-map procedure list)
(loop ((for element (in-list list))
(for result
(listing (procedure list) => (lambda (x) x))))
=> result))
Finally, the `=>' clause may be preceded by a test, to support multiple
return values and to allow the passage of #F to the procedure following
the `=>', as in SRFI 61's extended COND. An example of this is left to
the imagination of the reader.
Along with LISTING, there are three other similar list-related
accumulators: LISTING-REVERSE, APPENDING, and APPENDING-REVERSE. For
elements elt_0, elt_1, ..., elt_n-1, elt_n, LISTING yields
(LIST elt_0 elt_1 ... elt_n-1 elt_n);
LISTING-REVERSE yields
(LIST elt_n elt_n-1 ... elt_1 elt_0);
APPENDING yields
(APPEND elt_0 elt_1 ... elt_n-1 elt_n);
and APPENDING-REVERSE yields
(APPEND (REVERSE elt_n)
(REVERSE elt_n-1)
...
(REVERSE elt_1)
(REVERSE elt_0)),
although it can be implemented considerably more efficiently than
(APPENDING (REVERSE ...)).
There are also two destructive list-related accumulators: LISTING! and
LISTING-INTO!. While LISTING must build up a list in reverse and then
reverse it at the end, which requires twice as many cons cells as the
result uses, LISTING! builds up a list forward by setting the cdr of
each previous pair constructed, and requires only one intermediate cons
cell, the initial one. LISTING-INTO! uses no intermediate cons cells
because it is seeded with an initial pair to set the cdr of:
(let ((x (cons 'initial '())))
(loop ((for i (up-from 0 (to 10)))
(for result (listing-into! x (* i i) (if (even? i))))))
x)
;Value: (INITIAL 0 4 16 36 64)
LISTING! and LISTING-INTO! are dangerous, however, because they are
non-reentrant -- that is, it is unsafe to use them where non-local
control transfers to reenter the loop body may occur, or where the loop
may be continued at multiple different points.
*** Other LOOP Clauses: Local Bindings, Explicit Termination Conditions
So far we have seen FOR and WITH clauses in LOOP expressions -- FOR to
dispatch to some specialized iterator, and WITH to express manual loop
variables. These are the main clauses of interest, but there are four
others: WHILE, UNTIL, LET, and LET-VALUES.
WHILE and UNTIL allow the programmer to specify termination conditions
for the loop following which the final expression will be evaluated.
Although it is possible to explicitly continue the loop in one arm of a
conditional, this requires some boilerplate to invoke the final
expression as well in the other arm of the conditional. For example,
this procedure reads lines from an input port until an empty line, and
returns a list of the non-empty lines, assuming a READ-LINE procedure:
(define (read-non-empty-lines input-port)
(loop ((for line (in-port input-port read-line))
(until (string-empty? line))
(for lines (listing line)))
=> lines))
Iteration will terminate either when READ-LINE returns an EOF object,
tested implicitly by IN-PORT, or when the line read is an empty string,
tested explicitly by the user. In either case, LISTING will bind LINES
to the list of lines in the final expression, which we return there.
(LET <variable> <expression>) binds <variable> to <expression>, after
it has been determined that none of the FOR iterators have run out, and
after all of their variables have been bound. <Variable> is visible
inside the loop body. (LET-VALUES <bvl> <producer>) is similar, but it
binds multiple values, rather than a single value; (LET <variable>
<expression>) is equivalent to (LET-VALUES (<variable>) <expression>).
All variables bound by the iterators are visible in the right-hand
sides of LET & LET-VALUES clauses, and all those variables as well as
all variables on the left-hand sides of LET & LET-VALUES clauses are
visible in the user termination conditions -- those supplied by WHILE
and UNTIL -- and in the loop body.
This concludes the introduction to foof-loop. The rest of this
document contains a reference manual for foof-loop as well as some
further material.
** Loop Structure
LOOP forms have the following structure:
(LOOP [<loop-name>] (<loop-clause> ...)
[=> <final-expression>]
[<body>])
<loop-clause> may have one of the following forms:
(<variable> <initializer> [<update>])
(WITH <variable> <initializer> [<update>])
(FOR <variable> ... (<iterator> <argument> ...))
(LET <variable> <expression>)
(WHILE <condition>)
(UNTIL <condition>)
A loop operates by stepping through each iterator in parallel until one
of them runs out; that is, when one of its termination conditions
holds. When this happens, if the <final-expression> is supplied, it is
evaluated in an environment with the final values of all the loop
variables bound, as well as any final bindings supplied by the
iterators.
If <loop-name> is not supplied, the loop implicitly proceeds at the end
of the loop body, which may be empty. If <loop-name> is supplied, it
is bound to a macro that the loop body must call to explicitly proceed
the loop. This macro may be called arbitrarily many times, in tail or
non-tail positions; it is appropriate for recursive loops. (Note,
though, that some iterators are non-reentrant, which makes multiple
calls unsafe. These iterators are discouraged, however, and by
convention are marked with a suffix of `!'; examples of non-reentrant
iterators include LISTING! and LISTING-INTO!.)
The macro <loop-name> accepts arguments by which to update the loop
variables. Arguments may be passed by position, or by name, or both.
Positional arguments come first; they correspond with the first loop
variables in the same positions specified in WITH clauses. Arguments
passed by name to update the named loop variables are written with the
form (=> <variable> <update>).
All arguments are optional; positional arguments supply values for the
first arguments that are supplied. All loop variables have default
update values, if they are not explicitly updated when the macro
<loop-name> is invokes: those created by (WITH <variable> <initializer>
<update>) clauses are by default updated to the value of <update>,
those created by (WITH <variable> <initializer>) remain the same at the
next iteration, and those created by iterators are updated by means
that the iterator supplies implicitly.
(loop continue ((with a 0)
(with b '()
;; Default update for B:
(cons a b))
(for c d (in-list '(i j k p q r))))
(write (list a b c d))
(newline)
(continue (+ a 1) ;First position, for the first (WITH A 0).
(=> d (cddr d)))) ;D is from (FOR C D (IN-LIST ...)).
;; Output:
;(0 () i (i j k p q r))
;(1 (0) k (k p q r))
;(2 (1 0) q (q r))
In the loop, there are several classes of variable bindings that may
arise:
- /Loop variables/ are variables that are initialized at the
beginning of the loop and updated at every iteration non-
destructively -- that is, by rebinding through procedure call, not
by assignment. Loop variables are bound in the user termination
conditions, the user variable expressions, the loop body, and the
final expression.
- /Entry variables/ are variables whose values are computed when each
iteration of the loop is entered, before the termination conditions
have been tested. They cannot be updated like loop variables; each
is always initialized on loop entry to the value of the same
expression. This means that the expression need not be duplicated
at the beginning of the loop and when the loop is continued. Entry
variables are bound in the user termination conditions, the user
variable expressions, the loop body, and the final expression.
- /Body variables/ are variables whose values are computed at each
iteration of the loop only if iteration has not been terminated
from iterator termination conditions. Body variables are bound in
the user termination conditions, the user variable expressions, and
the loop body.
- /User variables/ are variables whose values are computed after all
iterator termination conditions have failed, and after the body
variables have been bound, but before the user termination
conditions have been tested. User variables are introduced by LET
and LET-VALUES clauses in LOOP. User variables are bound only in
the user termination conditions and the loop body.
- /Final variables/ are variables that are computed once at the end
of the loop, once any termination condition has been met. Final
variables are bound only in the final expression.
The loop iteration is controlled by the loop clauses:
(WITH <variable> <initializer> [<update>])
Introduces a loop variable named <variable>, initialized to the
value of <initializer>. On each iteration of the loop, the
variable may be explicitly updated; if it is not explicitly
updated, but there was an <update> expression supplied, then the
variable is updated to the value of that expression in an
environment where all the loop variables, entry variables, body
variables, and user variables are bound; otherwise, <variable>
remains unchanged at the next iteration of the loop.
(<variable> <initializer> [<update>])
Identical to (WITH <variable> <initializer> [<update>]). This
alternative form is provided so that the LOOP syntax is compatible
with named LET, and very similar to DO. For instance,
(let next ((x 0))
(if (< x 10)
(begin (write x) (next (+ x 1))))),
(loop next ((x 0))
(if (< x 10)
(begin (write x) (next (+ x 1))))),
(do ((x 0 (+ x 1)))
((>= x 10))
(write x)), and
(loop ((x 0 (+ x 1))
(until (>= x 10)))
(write x))
are all equivalent.
The WITH syntax is preferred; it has greater visual distinction,
and it it is easier for pretty-printers to indent better with less
effort.
(FOR <variable> ... (<iterator> <argument> ...))
General iteration form. Although <iterator> can do anything with
the <variable>s and <argument>s, by covention the <variable>s are
variables to be bound in the loop body or the final expression, and
the <argument>s are expressions or directives controlling the
iteration.
No guarantees are made preserving the ordering of the FOR clauses;
it is an error to rely on any properties of the ordering.
(LET <variable> <expression>)
(LET-VALUES <bvl> <expression>)
These introduce user variable bindings. LET binds <variable> to
the single value of <expression> inside the body bindings and
before the user termination conditions and loop body. LET-VALUES
binds all the variables in the bound variable list <bvl> to the
respective values in the multiple values returned by <expression>.
(LET <variable> <expression>) is equivalent to the clause
(LET-VALUES (<variable>) <expression>).
(WHILE <condition>)
(UNTIL <condition>)
User-supplied termination conditions. The termination conditions
are evaluated in an environment where all of the loop variables,
entry variables, body variables, and user variables are bound. All
of the iterator-supplied termination conditions are guaranteed to
have failed.
*** Loop Expansion
When a LOOP form is fully processed and expanded, the result resembles
this:
(LET-VALUES ((<outer-bvl> <outer-producer>)
...)
(DEFINE (LOOP-PROCEDURE <loop-variable> ...)
(LET-VALUES ((<entry-bvl> <entry-producer>)
...)
(LET ((FINISH (LAMBDA ()
(LET-VALUES ((<final-bvl> <final-producer>))
<final-expression>))))
(IF (OR <termination-condition> ...)
(FINISH)
(LET-VALUES ((<body-bvl> <body-producer>)
...)
(LET-VALUES ((<user-bvl> <user-producer>)
...)
(IF (OR <user-termination-condition> ...)
(FINISH)
<loop-body>)))))))
(LOOP-PRODEDURE <loop-initializer> ...))
Each <*-bvl> is a bound variable list supplied by an iterator or by the
user. Each <*-producer> is an expression yielding multiple values to
be distributed to the corresponding bound variable list. The
<user-bvl>s are from user-supplied LET and LET-VALUES clauses in loops;
the <user-termination-conditions> are from user-supplied WHILE and
UNTIL clauses in loops.
LOOP-PROCEDURE is a name invisible to all of the forms involved in the
loop except for the macro to which the loop's name is bound in the loop
body. This macro processes its positional and named arguments to form
a call to LOOP-PROCEDURE by supplying default update expressions
according to WITH clauses or what the iterators provide.
**** Efficiency Concerns
One of the primary goals of LOOP is to generate efficient code. It
should be possible to write floating-point matrix multiplication
routines as fast with LOOP as by hand, and perhaps faster because the
details of the matrix operations can be swept under the rug of a matrix
iterator. There may be some questions, however, of the efficiency of
the code that LOOP generates.
***** Use of Assignment
Many other loop macros, such as Common Lisp LOOP, SRFI 42 Eager
Comprehensions, and others, update loop variables by assignment, rather
than rebinding. Even hand-written loops using named LET or DO are
written with assignment, because it is more convenient with large
numbers of loop variables which would otherwise have to be enumerated
repeatedly for every recursive call to the loop procedure.
This is a death-knell for efficient code, especially in many highly
optimizing Scheme compilers, because most Scheme compilers generate
heap-allocated cells to store mutable variables. Compilers' lives are
simplified by assuming that all variables are immutable; they are free
to move references to the variables about as they please. The analysis
necessary to transform a program with SET! into a program without SET!
but with explicit, heap-allocated cells is trivial, and allows
compilers to make the assumption that variables are immutable;
subsequently eliding those cells anywhere but in closure references is
considerably harder, and not worth the effort for idiomatic Scheme
code.
It may be the case that a pure tree interpreter can execute assignment-
ridden code more efficiently than procedure calls with large numbers of
arguments, because of the overhead of recursively processing each of
those arguments. It is also the case that tree interpreters are slow
anyway.
Using assignment to update loop variables is a losing proposition in
the end. LOOP never introduces assignments to mutable variables; all
loop variables are updated by recursive procedure call, i.e. rebinding.
Custom iterators are discouraged from doing this as well, although
nothing prevents it.
***** Local Closure for FINISH
Because the construction of a closure for FINISH is costly in many
Scheme implementations which have extremely naive compilers, the
current implementation of LOOP creates the closure only if necessary to
preserve scoping; it will integrate the body of FINISH if and only if
there are no user termination conditions, or there are no iterator-
supplied termination conditions, iterator-supplied body bindings, and
user-supplied bindings around the body.
However, programmers who are worried about the closure for FINISH would
most likely be better served by a Scheme implementation that goes to
the slighest effort necessary to recognize the local closure and fold
the unconditional jumps into the code for the conditional.
***** Multiple Return Values
Opponents of multiple return values may object to the pervasion of
LET-VALUES throughout the expansion and the inefficiency thereof in
Scheme implementations that do not subscribe earnestly to the doctrine
of multiple return values. LET-VALUES is the only way to allow
iterators to bind multiple variables that may have complex inter-
dependencies without the indirection and cost of intermediate data
structures.
There is no cost on Scheme systems with efficient support for multiple
return values. There will be a cost in Scheme systems with costly
multiple return values, whether LOOP uses multiple return values, and
incurs the cost implicitly, or whether LOOP generates code to construct
intermediate data structures. But on Scheme systems with efficient
multiple return value support, LOOP incurs no cost. Essentially, by
using multiple return values, LOOP defers the question of the cost to
the Scheme implementation, rather than incurring cost *wherever* it is
used by generating code that *always* constructs intermediate data
structures.
Furthermore, for most cases, there is only a single return value
anyway. LET-VALUES itself need not incur whatever overhead of multiple
return values there is simply by recognizing single-value cases.
Unfortunately, the reference implementation of SRFI 11 does not do
this, and will generate CALL-WITH-VALUES invocations for every binding
clause in the LET-VALUES. Here is an implementation that avoids this:
<http://mumble.net/~campbell/scheme/let-values.scm>.
***** Index Variables and Generic Arithmetic
Many Scheme implementations provide specialized arithmetic for limited
integers fixed to a width that fits inside machine registers; using
this arithmetic is much faster than generic arithmetic, where safety
and overflow are not issues. These limited integers, often called
`fixnums', are typically guaranteed to be the domain of vector indices.
LOOP generates generic arithmetic for index variables, in iterators
such as UP-FROM and IN-VECTOR. There are a number of reasons for
this:
1. There is no portable specialized arithmetic for LOOP to use.
2. LOOP is not meant only for code inside inner loops that must be as
fast as possible; programs written with LOOP should not fail
obscurely because they passed some iterator a non-fixnum value where
a fixnum was expected.
3. The set of built-in iterators would be complicated by support for
fixnum arithmetic. Do we add, for example, UP-FROM-FIXNUM,
UP-FROM-UNSIGNED-FIXNUM, and so on? Do we have IN-VECTOR,
IN-VECTOR/UNSAFE, and so on?
4. Programs that use fixnum arithmetic are harder to understand,
because with fixnum arithmetic comes a plethora of subtle
implications on the program. For example, the usual formula for
finding the average of two integers -- (QUOTIENT (+ A B) 2) -- is no
longer valid under fixnum arithmetic, because the sum may overflow
the bounds of fixnums. A program that once used Scheme vectors but
has been converted to use sparse vectors, which allow arbitrary
integer indices, may no longer be able to use fixnum arithmetic.
It is up to the compiler to specialize the generic arithmetic where
appropriate, such as the indices inside IN-VECTOR, and to insert pre-
loop type and range checks where appropriate. Programmers who use
compilers incapable of this should consider finding better compilers,
or improving the ones they use.
Common Lisp, by comparison, has no specialized arithmetic either;
rather, it has programmer-supplied declarations regarding the types of
variables' values. This is even more specific than specialized
arithmetic, and does not require the introduction of N new arithmetic
routines for every kind of specialized arithmetic, so it composes with
macros well. This would be a better way to go about improving the
efficiency of arithmetic, and it would require minimal changes to LOOP
-- at most, support for iterators to supply declarations in the loop.
** Built-in Iterators
*** Iterating across Lists
(FOR <element> [<pair>] (IN-LIST <list> [<successor>]))
Iterates for each successive pair in <list>, binding the variable
<pair>, if supplied, to that pair, and the variable <element> to the
car of that pair. The successor of the pair is taken by applying
<successor> to the current pair. The iteration runs out when the
successor is no longer a pair.
<Successor>, if supplied, must be a unary procedure. The default
value of <successor> is CDR. <List> and <successor> are evaluated
once before the loop begins.
<Pair> is a loop variable; its value in the final expression is
unspecified.
<Element> is a body variable.
The next pair is taken before the loop body, so the loop body may
reliably apply SET-CDR!, or its equivalent for the given successor
procedure, to the pair, without altering the iteration. Note,
though, that <successor> will be called irrespective of whether
<pair> is updated explicitly.
(loop ((for a (in-list '(a b c)))
(for b (in-list '(p q r))))
(write (list a b))
(newline))
;; Output:
;(a p)
;(b q)
;(c r)
(loop ((for x (in-list '(1 2 3)))
(with y 0 (+ y (* x 2))))
=> y)
;Value: 12
;;; PAIR-FOLD from SRFI 1.
(define (pair-fold kons knil list)
(loop ((for elt pair (in-list list))
(with knil knil (kons pair knil)))
=> knil))
(FOR <elements> [<pairs>] (IN-LISTS <lists>))
Iterates for each successive pair in each parallel element of <lists>,
binding the variable <pairs>, if supplied, to be a list of those
pairs, and the variable <elements> to a list of the cars of those
pairs. The successor of each pair is taken by CDR. The iteration
runs out when any successor is no longer a pair.
<Lists> must be a non-empty list; it is an error if otherwise. It is
evaluated once before the loop begins.
<Pairs> is a loop variable; its value in the final expression is
unspecified.
<Elements> is a body variable.
The cdrs are taken before the loop body, so the loop body may
reliably apply SET-CDR! to any of the pairs without altering the
iteration.
;;; Transpose a matrix represented as a nested list.
(loop ((for columns (in-lists '((a b c) (d e f))))
(with rows '() (cons columns rows)))
=> rows)
;Value: ((c f) (b e) (a d))
(define (every? predicate list . lists)
(loop proceed ((for elts (in-lists (cons list lists))))
(and (apply predicate elts)
(proceed))))
(define (any predicate list . lists)
(loop proceed ((for elts (in-lists (cons list lists))))
(or (apply predicate elts)
(proceed))))
*** Iterating across Vectors and Strings
(FOR <element> [<index>] (IN-VECTOR <vector> [<low> [<high>]]))
(FOR <element> [<index>] (IN-STRING <string> [<low> [<high>]]))
(FOR <element> [<index>] (IN-VECTOR-REVERSE <vector> [<high> [<low>]]))
(FOR <element> [<index>] (IN-STRING-REVERSE <string> [<high> [<low>]]))
These iterate for each successive index in the vector or string,
respectively, binding the variable <index>, if supplied, to that
index, and the variable <element> to the element at that index.
IN-VECTOR and IN-STRING run from <low>, inclusive, to <high>,
exclusive; IN-VECTOR-REVERSE and IN-STRING-REVERSE run from <high>,
exclusive, to <low>, inclusive.
<Vector> or <string> must be a vector or a string, respectively.
<Low> and <high>, if supplied, must be exact, non-negative integers,
and valid bounds for <string> or <vector>. The default values of
<low> and <high> are 0 and the length of the vector or string,
respectively. <Vector>, <string>, <low>, and <high> are all
evaluated once before the loop begins.
<Index> is a loop variable; its value in the final expression is the
value of the last bound of iteration (the value of <high> for
IN-VECTOR and IN-STRING; the value of <low> for IN-VECTOR-REVERSE and
IN-STRING-REVERSE). With the reverse iterators, IN-VECTOR-REVERSE
and IN-STRING-REVERSE, explicitly updating <index> when continuing
the loop will cause the next value seen by the loop body to be
one *less* than what the index was updated to. That is, on loop
entry, <index> is assumed to be an exclusive upper bound, and so it
is tested for termination and then decremented to find the actual
index.
<Element> is a body variable.
Because the lower bound is inclusive and the upper bound is
exclusive, for IN-VECTOR and IN-STRING, the first value of <index> is
<low>, and the value of <index> in the final expression is <high>;
while for IN-VECTOR-REVERSE and IN-STRING-REVERSE, the first value of
<index> in the loop body is one less than <high>, and the value of
<index> in the final expression is <low>.
(loop ((for a i (in-vector '#(foo bar baz)))
(for b j (in-string-reverse "abcdefghi" 6 3)))
=> (list i j)
(write (list a i b j))
(newline))
;Value: (3 3)
;; Output:
;(foo 0 f 5)
;(bar 1 e 4)
;(baz 2 d 3)
;;; Find the first index of an element satisfying a predicate.
(define (vector-index vector predicate)
(loop proceed ((for elt index (in-vector vector)))
(if (predicate elt)
index
(proceed))))
;;; Copy characters from one string to another.
(define (string-copy! target tstart source sstart send)
(loop ((for char (in-string source sstart send))
(with index tstart (+ index 1)))
(string-set! target index char)))
(loop proceed
((for v i
(in-vector
'#(a b c d e f g h i j k l m n o p q r s t u v w x y z))))
(write (list v i))
(newline)
(proceed (=> i (+ 1 (* i 2)))))
;; Output:
;(a 0)
;(b 1)
;(d 3)
;(h 7)
;(p 15)
*** Iterating through Input
(FOR <datum> (IN-PORT <input-port> [<reader> [<eof?>]]))
Iterates for successive data read from <input-port>, binding the
variable <datum> to the datum read by applying the value of <reader>.
Iteration terminates when the datum read satisfies the predicate
<eof?>.
<Input-port> must be an input port. <Reader> and <eof?>, if
supplied, must be unary procedures. The default value of <reader> is
READ-CHAR, and the default value of <eof?> is EOF-OBJECT?.
<Input-port>, <reader>, and <eof?> are all evaluated once before the
loop begins.
<Datum> is an entry variable.
(define (read-line input-port)
(let ((initial (peek-char input-port)))
(if (eof-object? initial)
initial
(loop ((for char (in-port input-port))
(until (char=? char #\newline))
(with chars '() (cons char chars)))
=> (list->string (reverse chars))))))
(define (read-all input-port)
(loop ((for datum (in-port input-port read))
(with data '() (cons datum data)))
=> (reverse data)))
(FOR <datum> (IN-FILE <pathname> [<reader> [<eof?>]]))
Opens an input port from the file named by <pathname>, iterates as
with IN-PORT from the newly opened input port, and closes the input
port after iteration terminates.
<Datum> is an entry variable.
(define (read-lines-from-file pathname)
(loop ((for line (in-file pathname read-line))
(with lines '() (cons line lines)))
=> (reverse lines)))
(define (file-length pathname)
(loop ((for char (in-file pathname))
(with count 0 (+ count 1)))
=> count))
*** Iterating over Numbers
(FOR <number> (UP-FROM <low> [(TO <high>)] [(BY <step>)]))
(FOR <number> (DOWN-FROM <high> [(TO <low>)] [(BY <step>)]))
Iterates for each successive number starting at <low> and
incrementing by <step> at every iteration, for UP-FROM; or starting
at <high> and decrementing by <step> at every iteration, for
DOWN-FROM. <Number> is bound to the current number. If the TO
parameter is supplied, iteration terminates when the number exceeds
the given bound.
<Low>, <high>, and <step> must all be exact numbers. If a TO
parameter is supplied, they must furthermore all be real numbers.
They are all evaluated once before the loop begins.
<Number> is a loop variable, *and*, in DOWN-FROM, an entry variable.
If it is updated explicitly, its value at the next iteration in the
loop's body will be one <step> *less* than what it was updated to
explicitly. Because of this, although it is initialized to <high>,
it will never have a value of <high> (unless it is explicitly updated
to the value of (+ <high> <step>)).
(define (iota count start step)
(loop ((for n (up-from 0 (to count)))
(for result (listing (+ start (* n step)))))
=> result))
;;; Note that the following is incorrect, if the arguments to IOTA
;;; may be inexact numbers:
(define (iota count start step)
(loop ((for n (up-from start
(to (+ start (* count step)))
(by step)))
(for result (listing n)))
=> result))
(define (sieve n)
(let ((table (make-bit-string (- n 2) #t)))
(define (prime? k) (bit-string-ref table (- k 2)))
(define (not-prime! k) (bit-string-clear! table (- k 2)))
(define (purge-multiples i)
(loop ((for j (up-from (* i i)
(to n)
(by i))))
(not-prime! j)))
(loop proceed ((for i (up-from 2 (to n)))
(with primes '()))
=> (reverse primes)
(if (prime? i)
(begin (purge-multiples i)
(proceed (=> primes (cons i primes))))
(proceed)))))
(define (vector-quick-sort! elt< vector start end)
(loop sort ((start start) (end end))
(if (< 1 (- end start))
(let ((pivot (select-pivot vector start end)))
(loop continue ((i start) (j end))
(let ((i (loop scan ((for i (up-from i)))
(if (elt< (vector-ref vector i) pivot)
(scan)
i)))
(j (loop scan ((for j (down-from j)))
(if (elt< pivot (vector-ref vector j))
(scan)
j))))
(if (< i j)
(begin (vector-exchange! vector i j)
(continue (+ i 1) j))
(begin (sort (=> end i))
(sort (=> start (+ j 1)))))))))))
*** Accumulating
Accumulating iterators all follow the same basic forms: accumulate each
value of an expression into a result, with an optional condition for
the accumulation. Accumulators differ according to the way that they
accumulate, whether the intermediate result of accumulation is
available in the body of the loop or whether only the fully accumulated
result is available in the final expression, and whether the
accumulator accepts an optional initial value argument.
The syntax of accumulators is as follows:
- Unconditional:
(FOR <result> (<accumulating> <datum>))
<Datum> is evaluated at each iteration in an environment with all
loop, entry, and body variables bound from the current iteration.
- Conditional -- evaluate and accumulate <datum> if and only if
<condition> is not false:
(FOR <result> (<accumulating> <datum> (IF <condition>)))
<Datum> and <condition> are evaluated at each iteration in an
environment with all loop, entry, and body variables bound from the
current iteration.
- Conditional -- if <condition> is not false, apply the value of
<mapper> to the value of <condition> to yield the value to
accumulate:
(FOR <result> (<accumulating> <condition> => <mapper>))
<Condition> and <mapper> are evaluated at each iteration in an
environment with all loop, entry, and body variables bound from the
current iteration. <Mapper> is evaluated only if <condition> is not
false.
- Generalized conditional, as in SRFI 61-style `=>' clauses in COND --
evaluate <generator>, and if <tester> applied to all the values
returned by <generator> returns true, accumulate the value returned
by <mapper> when applied to all the values returned by <generator>:
(FOR <result> (<accumulating> <generator> <tester> => <mapper>))
<Generator>, <tester>, and <mapper> are evaluated at each iteration
in an environment with all loop, entry, and body variables bound from
the current iteration. <Mapper> is evaluated only if <test> yields
true.
Some accumulators also have initial values, supplied with an optional
(INITIAL <initial-value>) form before any of the other arguments to the
accumulator.
Accumulators that are allowed to operate by destructively modifying
internal state are marked with `!' suffixes. These accumulators are
/non-reentrant/, which means that it is unsafe to write loops that may
be recursively run more than once, and that non-local re-entry may
destroy values returned from the loop.
**** Accumulating Lists
(FOR <result> (LISTING [(INITIAL <tail>)] ...))
(FOR <result> (LISTING-REVERSE [(INITIAL <tail>)] ...))
These each produce lists of the accumulated data with a final cdr of
<tail>. LISTING yields a list in the order that the data were
produced; LISTING-REVERSE yields a list in the reverse order of that.
<Tail>, if supplied, is evaluated once before the beginning of the
loop. The default value of <tail> is the empty list.
For LISTING, <result> is a final variable.
For LISTING-REVERSE, <result> is a loop variable; its value in the
final expression is the accumulated list in reverse.
(define (list-tabulate length procedure)
(loop ((for i (up-from 0 (to length)))
(for list (listing (procedure i))))
=> list))
(define (take list count)
(loop ((for i (up-from 0 (to count)))
(for elt (in-list list))
(for prefix (listing elt)))
=> prefix))
(define (append-reverse list tail)
(loop ((for elt (in-list list))
(for result (listing-reverse (initial tail) list)))
=> result))
(define (unzip5 list)
(loop ((for component (in-list list))
(for result1 (listing (first component)))
(for result2 (listing (second component)))
(for result3 (listing (third component)))
(for result4 (listing (fourth component)))
(for result5 (listing (fifth component))))
=> (values result1 result2 result3 result4 result5)))
(FOR <result> (APPENDING [(INITIAL <tail>)] ...))
(FOR <result> (APPENDING-REVERSE [(INITIAL <tail>)] ...))
These append the accumulated data into a list with a final cdr of
<tail>. APPENDING maintains the order of the lists and their
elements; APPENDING-REVERSE reverses the elements of the lists. The
accumulated data must all be proper lists.
<Tail>, if supplied, is evaluated once in the environment outside the
loop. The default value of <tail> is the empty list.
For APPENDING, <result> is a final variable.
For APPENDING-REVERSE, <result> is a loop variable; its value in the
final expression is the reverse-concatenated list.
(define (concatenate lists)
(loop ((for list (in-list lists))
(for result (appending list)))
=> list))
(FOR <result> (LISTING! [(INITIAL <tail>)] ...))
(FOR <result> (LISTING-INTO! <pair> [(INITIAL <tail>)] ...))
LISTING! builds up a list with a final cdr of <tail>. LISTING-INTO!
builds up a list with a final cdr of <tail> and destructively stores
it in the cdr of <pair>. Both use as little intermediate storage as
possible, which may involve using destructive operations, and as a
consequence are non-reentrant.
<Tail>, if supplied, is evaluated once in the environment outside the
loop. The default value of <tail> is the empty list.
<Result> is a final variable.
**** Accumulating Numerical Results
(FOR <result> (SUMMING [(INITIAL <initial-sum>)] ...))
(FOR <result> (MULTIPLYING [(INITIAL <initial-product>)] ...))
Accumulates sums or products by adding the accumulated data to
<initial-sum>, or multiplying the accumulated data by
<initial-product>.
<Initial-sum> and <initial-product>, if supplied, must be numbers,
and are evaluated once before the loop begins. The default value of
<initial-sum> is 0, and the default value of <initial-product> is 1.
<Result> is a loop variable.
(define (count predicate list)
(loop ((for elt (in-list list))
(for count (summing 1 (if (predicate elt)))))
=> count))
(FOR <result> (MINIMIZING [(INITIAL <initial-minimum>)] ...))
(FOR <result> (MAXIMIZING [(INITIAL <initial-maximum>)] ...))
Finds the minimum or maximum datum accumulated. If an initial value
expression is provided, it is evaluated once before the beginning of
the loop, all of the data involved must be real numbers, and <result>
will be a real number. If no initial value is provided, the data
must be either real numbers or false, and <result> will be either a
real number or false.
<Result> is a loop variable.
** Writing Custom Iterators
(NOTE: This section documents the implementation of iterators for
Taylor R. Campbell's current version of foof-loop. It does *not*
document the version originally written by Alex Shinn (after whose
nickname, `foof', the system is named), which has a somewhat less
flexible interface for custom iterators.)
One of the main goals of foof-loop is extensibility. To accomplish
this goal, writing custom iterators is a simple matter of specifying
the syntax of the iterator, all of the loop variables and other
variables bound by the loop, and the loop's termination conditions. In
order to implement foof-loop in portable SYNTAX-RULES, iterators are
macros in continuation-passing style.
When LOOP encounters a clause of the form
(FOR <variable> ... (<iterator> . <arguments>)),
it invokes <iterator> as (<iterator> (<variable> ...) <arguments>
<continuation> . <environment>), where the environment is simply
information that LOOP needs in order to continue processing the other
loop clauses, and where the continuation is a macro that returns
control back to LOOP given a description of all the variables bound by
and termination conditions tested for the iterator at hand.
The continuation should be invoked as follows:
(continuation
((<outer-bvl> <outer-producer>) ...)
;; Outer variables to be bound once outside the whole loop.
;; Each <outer-producer> is evaluated once in the enclosing
;; environment of the LOOP form.
((<loop-variable> <initializer> <update>) ...)
;; Loop variables. Each <initializer> is evaluated in an
;; environment where the outer variables are bound. Each <update>
;; is evaluated in an environment where the outer variables, loop
;; variables, entry variables, and body variables are all bound.
((<entry-bvl> <entry-producer>) ...)
;; Entry variables, to be bound each time the loop is entered.
;; Each <entry-producer> is evaluated in an environment where the
;; outer variables and loop variables are bound.
(<termination-condition> ...)
;; Conditions that will terminate the loop. These are all evaluated
;; in an environment where the outer variables, loop variables, and
;; entry variables are bound. The first condition, of any loop
;; clause, that holds true will terminate the iteration, short-
;; circuiting the rest of the termination conditions.
((<body-bvl> <body-producer>) ...)
;; Variables to be bound around the body of the loop, if the loop
;; does not terminate.
((<final-bvl> <final-producer>) ...)
;; Variables to be bound around the final expression of the loop, if
;; the loop does terminate.
. environment).
The <*-bvl>s are bound variable lists, and the variables in them are
bound to the respective values of the corresponding <*-producer>
expressions, as with LET-VALUES. The bindings are made in parallel,
also as with LET-VALUES. To make interdependent bindings, use multiple
return values in a single binding clause.
The ordering of the arguments to the continuation is intended to
reflect the ordering of control through the loop: first the outer
bindings are made, then the loop variables are initialized, then the
entry bindings are made, then the termination conditions are tested,
and finally either the body bindings or the final bindings are made,
depending on the termination conditions.
For example, here is a simplified definition of IN-LIST, omitting
support for a user-supplied loop variable, a user-supplied list
successor procedure, and the guarantee that SET-CDR! may be safely
applied to the pair without altering the iteration. (For the sake of
concision, we use the name `next' for the continuation and `rest' for
the environment.) In this simplified IN-LIST, we expect one variable
in FOR and one argument after IN-LIST -- i.e., (FOR <element-variable>
(IN-LIST <list-expression>)) --; hence, the initial two arguments to
the actual macro are `(element-variable)' and `(list-expression)'.
(define-syntax in-list
(syntax-rules ()
((IN-LIST (element-variable) (list-expression) next . rest)
(next (((LIST) list-expression)) ;Outer bindings
((PAIR LIST (CDR PAIR))) ;Loop variables
() ;Entry bindings
((NOT (PAIR? PAIR))) ;Termination conditions
(((element-variable) ;Body bindings
(CAR PAIR)))
() ;Final bindings
. rest))))
*** Reporting Syntax Errors
The above definition of IN-LIST will work, but, if used incorrectly, it
would yield a strange, incomprehensible syntax error to the casual user
of foof-loop, involving a great deal of state internal to the LOOP
macro. This is not what users like to see; rather, users like to see
useful, comprehensible error messages that are directly related to what
they wrote. Therefore, foof-loop provides two conveniences for
reporting syntax errors: SYNTACTIC-ERROR and LOOP-CLAUSE-ERROR. The
former is a more general way to report errors in macros:
(SYNTACTIC-ERROR <message> <form> ...)
will cause an error to be reported when this form is expanded, ideally
involving the given message and subforms. Many loop iterators are
implemented in terms of internal utilities handling the general
patterns of several specialized cases, such as IN-VECTOR and IN-STRING,
which generate very similar code. In order to report specific syntax
errors that correspond with what was actually the input to the LOOP
macro, these iterators pass error context descriptors to one another.
Error context is of the form
(<name> (<variable> ...) <arguments> <message>),
where <name> is the name of the iterator, the <variable>s are the forms
passed in the FOR clause before the iterator, <arguments> is the tail
of the iterator subform inside the FOR clause, and <message> is a
message explaining the error (which cannot be constructed from <name>
due to liminations in SYNTAX-RULES). The macro LOOP-CLAUSE-ERROR
invokes SYNTACTIC-ERROR with a FOR form that corresponds with the
original FOR form seen by LOOP. That is,
(LOOP-CLAUSE-ERROR (<name> (<variable> ...) <arguments> <message>))
expands to
(SYNTACTIC-ERROR <message>
(FOR <variable> ...
(<name> . <arguments>))).
Using LOOP-CLAUSE-ERROR for more robust syntax error notification, we
can write the simplified IN-LIST by adding a fall-through syntax rule
that matches any possible invocation of the iterator from LOOP:
(define-syntax in-list
(syntax-rules ()
((IN-LIST (element-variable) (list-expression) next . rest)
(next (((LIST) list-expression)) ;Outer bindings
((PAIR LIST (CDR PAIR))) ;Loop variables
() ;Entry bindings
((NOT (PAIR? PAIR))) ;Termination conditions
(((element-variable) ;Body bindings
(CAR PAIR)))
() ;Final bindings
. rest))
((IN-LIST variables arguments next . rest)
;; Ignore NEXT and REST; we only report the syntax error.
(LOOP-CLAUSE-ERROR
(IN-LIST variables arguments
"Malformed IN-LIST form in LOOP:")))))
With the appropriate definition of SYNTACTIC-ERROR shown below, if we
misuse IN-LIST, we get a helpful error message in most Schemes, along
these lines:
(loop ((for x y z (in-list)))
(body x y z))
Syntax error: Malformed IN-LIST clause in LOOP:
(for x y z (in-list))
**** Pitfalls of Reporting Syntax Errors
It is important to remember the principle behind passing error context
to any macros in terms of which iterators are implemented: to present
coherent syntax errors reporting the original iterator form. In an
ideal world, this would always be easy, but because we are limited to
te tools of SYNTAX-RULES and SYNTAX-RULES alone, this is not as easy as
we should like. There are cases where it is convenient, for example,
to supply default values for optional parameters by recursively calling
the iterator macro. This means, though, that if there is a syntax
error for some other reason -- for example, if some of the arguments
are named, or there are optional variables *and* optional arguments --,
which goes undetected until some default values have been supplied, the
wrong form will be reported to the user.
To avoid this, it is better to rewrite iterators into invocations of
internal macros that take error context as an extra parameter. This
precludes writing one iterator in terms of another one, however, which
is an unfortunate consequence of the way that foof-loop is designed,
which is itself an unfortunate consequence of the tools available. We
should like to be able to have a macro debugger that can examine the
expansion history of the forms leading up to a syntax error, but this
is not a reliably portable concept; few Schemes even record the
expansion history (only one that I know of, MIT Scheme).
Unfortunately, at the moment, foof-loop does not provide facilities for
anything more than reporting syntax errors based on SYNTAX-RULES
patterns alone. For example, there is no way to verify that a variable
to be bound is, in fact, a variable; if the user writes anything other
than a variable, there will probably be a very strange error much later
on involving invalid definitions with strange lambda lists. It is not
difficult to conceive of the interface to such checks; however, they
are considerably more trouble to implement than they would be worth in
program development at the moment.
** Porting to New Schemes
The implementation of foof-loop is divided into three files, one of
which is optional but useful:
- foof-loop.scm implements the main LOOP macro and the built-in
iterators.
- syn-param.scm implements a general macro for producing local macros
that accept named parameters with the form `(=> <name> <value>)'.
- let-values.scm is optional, but implements a portable LET-VALUES form
(see SRFI 11) that gracefully handles single-value cases in order to
avoid superfluous uses of CALL-WITH-VALUES. This definition of
LET-VALUES is more likely to work in Scheme systems that have
incorrect implementations of multiple return values, such as MIT
Scheme, although problems may still arise with (VALUES X), for
example.
Ideally, these files should be loaded into their own modules, or a
single module for foof-loop, if your Scheme system supports a module
system or any form of namespace separation. If this is not the case,
you may wish to prefix some of the identifiers in the files that name
internal operations in order to avoid namespace clashes.
The implementation of foof-loop is strictly limited to R5RS Scheme
except for one element, which is a utility to signal syntax errors
during macro expansion. (SYNTACTIC-ERROR message irritants ...) is
a *macro* that should signal an error, ideally involving the message
and irritants supplied. A legal, though not ideal, definition with
R5RS is:
(define-syntax syntactic-error (syntax-rules ()))
Here is a better definition, for Scheme48, if we open the SIGNALS and
NODES structures in the for-syntax environment (but note that this is a
sequence of interactive commands, hence the prompt, and *not* code to
store in a file to load):
> ,for-syntax ,open signals nodes
> (define-syntax syntactic-error
(lambda (form rename compare)
(apply syntax-error (map schemify (cdr form))))
()) ;No auxiliary names
Note that if your Scheme does not support SYNTAX-RULES, it is not R5RS-
compliant, and it cannot load foof-loop. Also note that foof-loop uses
some very complicated patterns of SYNTAX-RULES, so expansion of LOOP
forms may be slow and taxing on your Scheme system's resources.
Unfortunately, there is no other way to do this portably. Finally,
note that Scheme systems that preserve the case of symbols in source
code are not R5RS-compliant and cannot load foof-loop. If your Scheme
system by default preserves case, but has an option for folding case,
you must set that option first.
** Design Notes and Rationale
For a comparison of (an older rendition of) foof-loop with other
contemporary looping macros, see the Usenet article with message-id
<1157562097.001179.11470@i42g2000cwa.googlegroups.com> posted on
comp.lang.scheme in September of 2006. Also, for extended comparative
examples, see <http://mumble.net/~campbell/scheme/loop-comparison.scm>.
These design notes pertain more specifically to choices made for Taylor
R. Campbell's variant of the macro.
*** Bounding Intervals
Always, in the built-in iterators' bounding intervals -- for iteration
over ranges of numbers, over segments of vectors, &c. --, the lower
bound is inclusive and the upper bound is exclusive. It doesn't matter
whether one starts from the upper bound and goes to the lower bound or
vice versa: the lower bound is inclusive and the upper bound is
exclusive.
My experience is that any inconsistency in this regard is a damning
path to confusing doom of program clarity. Off-by-one errors are still
possible, of course, especially if the looping macros are mixed with
other systems that do not follow this strict convention. However,
between R5RS and foof-loop, it is *much* easier to do the right thing
from the start, without having to meticulously adjust any upper bounds,
especially for backward iteration that starts from the upper bound. If
you find that you are performing such meticulous adjustment, ask
yourself whether you are doing the right thing, and if the answer is
`yes', I'd like to hear about it.
To reiterate, since this is very important:
The lower bound is always inclusive.
The upper bound is always exclusive.
It does not matter which bound you start from.
If you write any of your own custom iterators that involve any sort of
bounding intervals, *PLEASE* adhere to these rules unless you have an
extremely good reason not to, which I'd like to hear about if you do.