an introduction to ACL2 arrays

Below we begin a detailed presentation of ACL2 arrays. ACL2's single-threaded objects (see stobj) provide a similar functionality that is generally more efficient but also more restrictive. Related topics:

See arrays-example for a brief introduction illustrating the use of ACL2 arrays.

ACL2 provides relatively efficient 1- and 2-dimensional arrays. Arrays are awkward to provide efficiently in an applicative language because the programmer rightly expects to be able to ``modify'' an array object with the effect of changing the behavior of the element accessing function on that object. This, of course, does not make any sense in an applicative setting. The element accessing function is, after all, a function, and its behavior on a given object is immutable. To ``modify'' an array object in an applicative setting we must actually produce a new array object. Arranging for this to be done efficiently is a challenge to the implementors of the language. In addition, the programmer accustomed to the von Neumann view of arrays must learn how to use immutable applicative arrays efficiently.

In this note we explain 1-dimensional arrays. In particular, we explain briefly how to create, access, and ``modify'' them, how they are implemented, and how to program with them. 2-dimensional arrays are dealt with by analogy.

The Logical Description of ACL2 Arrays

An ACL2 1-dimensional array is an object that associates arbitrary objects with certain integers, called ``indices.'' Every array has a dimension, dim, which is a positive integer. The indices of an array are the consecutive integers from 0 through dim-1. To obtain the object associated with the index i in an array a, one uses (aref1 name a i). Name is a symbol that is irrelevant to the semantics of aref1 but affects the speed with which it computes. We will talk more about array ``names'' later. To produce a new array object that is like a but which associates val with index i, one uses (aset1 name a i val).

An ACL2 1-dimensional array is actually an alist. There is no special ACL2 function for creating arrays; they are generally built with the standard list processing functions list and cons. However, there is a special ACL2 function, called compress1, for speeding up access to the elements of such an alist. We discuss compress1 later.

One element of the alist must be the ``header'' of the array. The header of a 1-dimensional array with dimension dim is of the form:

         :MAXIMUM-LENGTH max
         :DEFAULT obj ; optional
         :NAME name   ; optional
         :ORDER order ; optional values are < (the default), >, or :none
Obj may be any object and is called the ``default value'' of the array. Max must be an integer greater than dim. Name must be a symbol. The :default and :name entries are optional; if :default is omitted, the default value is nil. The function header, when given a name and a 1- or 2-dimensional array, returns the header of the array. The functions dimensions, maximum-length, and default are similar and return the corresponding fields of the header of the array. The role of the :dimensions field is obvious: it specifies the legal indices into the array. The roles played by the :maximum-length and :default fields are described below.

Aside from the header, the other elements of the alist must each be of the form (i . val), where i is an integer and 0 <= i < dim, and val is an arbitrary object.

The :order field of the header is ignored for 2-dimensional arrays. For 1-dimensional arrays, it specifies the order of keys (i, above) when the array is compressed with compress1, as described below. An :order of :none specifies no reordering of the alist compress1, and an order of > specifies reordering by compress1 so that keys are in descending order. Otherwise, the alist is reordered by compress1 so that keys are in ascending order.

(Aref1 name a i) is guarded so that name must be a symbol, a must be an array and i must be an index into a. The value of (aref1 name a i) is either (cdr (assoc i a)) or else is the default value of a, depending on whether there is a pair in a whose car is i. Note that name is irrelevant to the value of an aref1 expression. You might :pe aref1 to see how simple the definition is.

(Aset1 name a i val) is guarded analogously to the aref1 expression. The value of the aset1 expression is essentially (cons (cons i val) a). Again, name is irrelevant. Note (aset1 name a i val) is an array, a', with the property that (aref1 name a' i) is val and, except for index i, all other indices into a' produce the same value as in a. Note also that if a is viewed as an alist (which it is) the pair ``binding'' i to its old value is in a' but ``covered up'' by the new pair. Thus, the length of an array grows by one when aset1 is done.

Because aset1 covers old values with new ones, an array produced by a sequence of aset1 calls may have many irrelevant pairs in it. The function compress1 can remove these irrelevant pairs. Thus, (compress1 name a) returns an array that is equivalent (vis-a-vis aref1) to a but which may be shorter. For technical reasons, the alist returned by compress1 may also list the pairs in a different order than listed in a.

To prevent arrays from growing excessively long due to repeated aset1 operations, aset1 actually calls compress1 on the new alist whenever the length of the new alist exceeds the :maximum-length entry, max, in the header of the array. See the definition of aset1 (for example by using :pe). This is primarily just a mechanism for freeing up cons space consumed while doing aset1 operations. Note however that this compress1 call is replaced by a hard error if the header specifies an :order of :none.

This completes the logical description of 1-dimensional arrays. 2-dimensional arrays are analogous. The :dimensions entry of the header of a 2-dimensional array should be (dim1 dim2). A pair of indices, i and j, is legal iff 0 <= i < dim1 and 0 <= j < dim2. The :maximum-length must be greater than dim1*dim2. Aref2, aset2, and compress2 are like their counterparts but take an additional index argument. Finally, the pairs in a 2-dimensional array are of the form ((i . j) . val).

The Implementation of ACL2 Arrays

Very informally speaking, the function compress1 ``creates'' an ACL2 array that provides fast access, while the function aref1 ``maintains'' fast access. We now describe this informal idea more carefully.

Aref1 is essentially assoc. If aref1 were implemented naively the time taken to access an array element would be linear in the dimension of the array and the number of ``assignments'' to it (the number of aset1 calls done to create the array from the initial alist). This is intolerable; arrays are ``supposed'' to provide constant-time access and change.

The apparently irrelevant names associated with ACL2 arrays allow us to provide constant-time access and change when arrays are used in ``conventional'' ways. The implementation of arrays makes it clear what we mean by ``conventional.''

Recall that array names are symbols. Behind the scenes, ACL2 associates two objects with each ACL2 array name. The first object is called the ``semantic value'' of the name and is an alist. The second object is called the ``raw lisp array'' and is a Common Lisp array.

When (compress1 name alist) builds a new alist, a', it sets the semantic value of name to that new alist. Furthermore, it creates a Common Lisp array and writes into it all of the index/value pairs of a', initializing unassigned indices with the default value. This array becomes the raw lisp array of name. Compress1 then returns a', the semantic value, as its result, as required by the definition of compress1.

When (aref1 name a i) is invoked, aref1 first determines whether the semantic value of name is a (i.e., is eq to the alist a). If so, aref1 can determine the ith element of a by invoking Common Lisp's aref function on the raw lisp array associated with name. Note that no linear search of the alist a is required; the operation is done in constant time and involves retrieval of two global variables, an eq test and jump, and a raw lisp array access. In fact, an ACL2 array access of this sort is about 5 times slower than a C array access. On the other hand, if name has no semantic value or if it is different from a, then aref1 determines the answer by linear search of a as suggested by the assoc-like definition of aref1. Thus, aref1 always returns the axiomatically specified result. It returns in constant time if the array being accessed is the current semantic value of the name used. The ramifications of this are discussed after we deal with aset1.

When (aset1 name a i val) is invoked, aset1 does two conses to create the new array. Call that array a'. It will be returned as the answer. (In this discussion we ignore the case in which aset1 does a compress1.) However, before returning, aset1 determines if name's semantic value is a. If so, it makes the new semantic value of name be a' and it smashes the raw lisp array of name with val at index i, before returning a' as the result. Thus, after doing an aset1 and obtaining a new semantic value a', all aref1s on that new array will be fast. Any aref1s on the old semantic value, a, will be slow.

To understand the performance implications of this design, consider the chronological sequence in which ACL2 (Common Lisp) evaluates expressions: basically inner-most first, left-to-right, call-by-value. An array use, such as (aref1 name a i), is ``fast'' (constant-time) if the alist supplied, a, is the value returned by the most recently executed compress1 or aset1 on the name supplied. In the functional expression of ``conventional'' array processing, all uses of an array are fast.

The :name field of the header of an array is completely irrelevant. Our convention is to store in that field the symbol we mean to use as the name of the raw lisp array. But no ACL2 function inspects :name and its primary value is that it allows the user, by inspecting the semantic value of the array -- the alist -- to recall the name of the raw array that probably holds that value. We say ``probably'' since there is no enforcement that the alist was compressed under the name in the header or that all asets used that name. Such enforcement would be inefficient.

Some Programming Examples

In the following examples we will use ACL2 ``global variables'' to hold several arrays. See @, and see assign.

Let the state global variable a be the 1-dimensional compressed array of dimension 5 constructed below.

ACL2 !>(assign a (compress1 'demo 
                            '((:header :dimensions (5)
                                       :maximum-length 15
                                       :default uninitialized
                                       :name demo)
                              (0 . zero))))
Then (aref1 'demo (@ a) 0) is zero and (aref1 'demo (@ a) 1) is uninitialized.

Now execute

ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
Then (aref1 'demo (@ b) 0) is zero and (aref1 'demo (@ b) 1) is one.

All of the aref1s done so far have been ``fast.''

Note that we now have two array objects, one in the global variable a and one in the global variable b. B was obtained by assigning to a. That assignment does not affect the alist a because this is an applicative language. Thus, (aref1 'demo (@ a) 1) must still be uninitialized. And if you execute that expression in ACL2 you will see that indeed it is. However, a rather ugly comment is printed, namely that this array access is ``slow.'' The reason it is slow is that the raw lisp array associated with the name demo is the array we are calling b. To access the elements of a, aref1 must now do a linear search. Any reference to a as an array is now ``unconventional;'' in a conventional language like Ada or Common Lisp it would simply be impossible to refer to the value of the array before the assignment that produced our b.

Now let us define a function that counts how many times a given object, x, occurs in an array. For simplicity, we will pass in the name and highest index of the array:

ACL2 !>(defun cnt (name a i x)
         (declare (xargs :guard
                         (and (array1p name a)
                              (integerp i)
                              (>= i -1)
                              (< i (car (dimensions name a))))
                         :mode :logic
                         :measure (nfix (+ 1 i))))
         (cond ((zp (1+ i)) 0) ; return 0 if i is at most -1
               ((equal x (aref1 name a i))
                (1+ (cnt name a (1- i) x)))
               (t (cnt name a (1- i) x))))
To determine how many times zero appears in (@ b) we can execute:
ACL2 !>(cnt 'demo (@ b) 4 'zero)
The answer is 1. How many times does uninitialized appear in (@ b)?
ACL2 !>(cnt 'demo (@ b) 4 'uninitialized)
The answer is 3, because positions 2, 3 and 4 of the array contain that default value.

Now imagine that we want to assign 'two to index 2 and then count how many times the 2nd element of the array occurs in the array. This specification is actually ambiguous. In assigning to b we produce a new array, which we might call c. Do we mean to count the occurrences in c of the 2nd element of b or the 2nd element of c? That is, do we count the occurrences of uninitialized or the occurrences of two? If we mean the former the correct answer is 2 (positions 3 and 4 are uninitialized in c); if we mean the latter, the correct answer is 1 (there is only one occurrence of two in c).

Below are ACL2 renderings of the two meanings, which we call [former] and [latter]. (Warning: Our description of these examples, and of an example [fast former] that follows, assumes that only one of these three examples is actually executed; for example, they are not executed in sequence. See ``A Word of Warning'' below for more about this issue.)

(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 (aref1 'demo (@ b) 2))  ; [former]

(let ((c (aset1 'demo (@ b) 2 'two))) ; [latter] (cnt 'demo c 4 (aref1 'demo c 2)))

Note that in [former] we create c in the second argument of the call to cnt (although we do not give it a name) and then refer to b in the fourth argument. This is unconventional because the second reference to b in [former] is no longer the semantic value of demo. While ACL2 computes the correct answer, namely 2, the execution of the aref1 expression in [former] is done slowly.

A conventional rendering with the same meaning is

(let ((x (aref1 'demo (@ b) 2)))                           ; [fast former]
  (cnt 'demo (aset1 'demo (@ b) 2 'two) 4 x))
which fetches the 2nd element of b before creating c by assignment. It is important to understand that [former] and [fast former] mean exactly the same thing: both count the number of occurrences of uninitialized in c. Both are legal ACL2 and both compute the same answer, 2. Indeed, we can symbolically transform [fast former] into [former] merely by substituting the binding of x for x in the body of the let. But [fast former] can be evaluated faster than [former] because all of the references to demo use the then-current semantic value of demo, which is b in the first line and c throughout the execution of the cnt in the second line. [Fast former] is the preferred form, both because of its execution speed and its clarity. If you were writing in a conventional language you would have to write something like [fast former] because there is no way to refer to the 2nd element of the old value of b after smashing b unless it had been saved first.

We turn now to [latter]. It is both clear and efficient. It creates c by assignment to b and then it fetches the 2nd element of c, two, and proceeds to count the number of occurrences in c. The answer is 1. [Latter] is a good example of typical ACL2 array manipulation: after the assignment to b that creates c, c is used throughout.

It takes a while to get used to this because most of us have grown accustomed to the peculiar semantics of arrays in conventional languages. For example, in raw lisp we might have written something like the following, treating b as a ``global variable'':

(cnt 'demo (aset 'demo b 2 'two) 4 (aref 'demo b 2))
which sort of resembles [former] but actually has the semantics of [latter] because the b from which aref fetches the 2nd element is not the same b used in the aset! The array b is destroyed by the aset and b henceforth refers to the array produced by the aset, as written more clearly in [latter].

A Word of Warning: Users must exercise care when experimenting with [former], [latter] and [fast former]. Suppose you have just created b with the assignment shown above,

ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
If you then evaluate [former] in ACL2 it will complain that the aref1 is slow and compute the answer, as discussed. Then suppose you evaluate [latter] in ACL2. From our discussion you might expect it to execute fast -- i.e., issue no complaint. But in fact you will find that it complains repeatedly. The problem is that the evaluation of [former] changed the semantic value of demo so that it is no longer b. To try the experiment correctly you must make b be the semantic value of demo again before the next example is evaluated. One way to do that is to execute
ACL2 !>(assign b (compress1 'demo (@ b)))
before each expression. Because of issues like this it is often hard to experiment with ACL2 arrays at the top-level. We find it easier to write functions that use arrays correctly and efficiently than to so use them interactively.

This last assignment also illustrates a very common use of compress1. While it was introduced as a means of removing irrelevant pairs from an array built up by repeated assignments, it is actually most useful as a way of insuring fast access to the elements of an array.

Many array processing tasks can be divided into two parts. During the first part the array is built. During the second part the array is used extensively but not modified. If your programming task can be so divided, it might be appropriate to construct the array entirely with list processing, thereby saving the cost of maintaining the semantic value of the name while few references are being made. Once the alist has stabilized, it might be worthwhile to treat it as an array by calling compress1, thereby gaining constant time access to it.

ACL2's theorem prover uses this technique in connection with its implementation of the notion of whether a rune is disabled or not. Associated with every rune is a unique integer index, called its ``nume.'' When each rule is stored, the corresponding nume is stored as a component of the rule. Theories are lists of runes and membership in the ``current theory'' indicates that the corresponding rule is enabled. But these lists are very long and membership is a linear-time operation. So just before a proof begins we map the list of runes in the current theory into an alist that pairs the corresponding numes with t. Then we compress this alist into an array. Thus, given a rule we can obtain its nume (because it is a component) and then determine in constant time whether it is enabled. The array is never modified during the proof, i.e., aset1 is never used in this example. From the logical perspective this code looks quite odd: we have replaced a linear-time membership test with an apparently linear-time assoc after going to the trouble of mapping from a list of runes to an alist of numes. But because the alist of numes is an array, the ``apparently linear-time assoc'' is more apparent than real; the operation is constant-time.