notes.txt

Galore Class-Based Implementation

Galore Class-Based Implementation

Last updated: April 21, 2006

Purpose:

The purpose of Galore is to provide immutable collection datastructures for PLT
Scheme.  This new version of the package has two goals: to expand the
functionality of Galore (most notably, finishing the implementation of lookup
tables a.k.a. finite mappings) and to implement Galore using classes and objects
in an Interface-Oriented style.

Status:

Short interfaces for both sets and tables have been written.  Three
implementations for sets are in place: sets with unordered keys (based on an
equality predicate), sets with ordered keys (based on a total ordering), and
sets with hashed keys (based on a hash function and equality predicate).  Note
that "hashed" keys do not imply an actual hash table is used (all data
structures are immutable), but they do provide a more efficient lookup than
unordered keys.  There are three table implementations are as well, based on
sets: tables with unordered keys, ordered keys, and hashed keys.  All existing
functions have at least minimal tests.

What remains: expanding the set of operations on sets and tables; implementing
the other Galore data structures (bags, heaps, etc.).

Design:

The new Galore has two interfaces: class-based and module-based.  The
class-based interface is the back-end.  Each new kind of datastructure is given
an interface and (implicit) contracts.  Implementations of these structures are
provided as classes.  Object construction is provided by separate factory
functions; this has proved easier to abstract over than exposing the class's
constructor arguments.  Each operation on the datastructure is a method of the
class/object.  In general, each class (and its constructor(s)) is provided by a
separate module (e.g. set-ordered.ss, set-unordered.ss, set-hashed.ss).

The module-based interface provides a functional interface to the class-based
system.  Each datastructure has a single module exporting its functionality
(e.g. set.ss, table.ss).  This module exports each constructor and operation as
a separate function, calling the object constructor or method as appropriate.
This module also imposes explicit contracts on these functions.

The class-based layer (hopefully) allows easy implementation and extension of
new datastructures.  If the classes obey proper Interface-Oriented Programming
practices, this layer should also provide interaction between different
implementations of the same datastructure "for free".

The module-based layer allows Scheme programmers to use Galore without knowledge
of the class-based layer.  It is also invariant with respect to new
datastructure implementations; that is, if a user implements a new kind of set
(say, set-of-integers.ss) as a class conforming to the proper interface, the
operations in set.ss will continue to work for their new objects.

Testing:

There are SchemeUnit tests for all functions exported by set.ss and table.ss so
far, and the goal is to maintain a "test first" policy for Galore: as new
interfaces and operations are added, test cases should be added before their
implementations are written.  All tests are performed on the module-based
interface, to allow flexibility in reconfiguring the class-based layer.  So far,
there is one test suite for each datastructure, parameterized over a constructor
for each different implementation.  This makes it easier to add new
datastructure implementations without duplicating the entire test suite and
guarantees test suite consistency.

Challenges:

Finding the right Interface-Oriented approach to immutable datastructures has
presented a few difficulties.  One inherent problem is that class inheritance
(statically or by mixin) and functional update are incompatible, or at least
very hard to reconcile.  I will explain this with an example.

(define my-simple-collection%
  (class* object% (simple-collection<%>)
    ...
    (define/public (the-method new-data)
      (new my-simple-collection%
           [copy-this-field copy-this-field]
           [update-this-field new-data]))
    ...))

Above I have an implementation of a simple collection, and I show a single
method.  This method produces a new collection by functional update: keeping
some fields of the current object, changing others.  Now let's try to inherit
from my-simple-collection%.

(define my-complex-collection%
  (class* my-simple-collection% (simple-collection<%> complex-collection<%>)
    ...
    (define/override (the-method new-data)
      (new my-complex-collection%
           [copy-this-field copy-this-field]
           [update-this-field new-data]))
    ...))

Our new complex collection would like the-method to produce a complex collection
as opposed to just a simple collection (this makes sense; we would like each
operation to preserve the added functionality).  Unfortunately, there are some
large problems with this new method implementation.

First of all, this assumes all the init arguments and/or field names of
my-simple-collection% are exposed to my-complex-collection%.  This is often
undesirable; if my-complex-collection% were a mixin it would be impossible.

Second of all, there is no actual code inheritance in this new method.  If we
try to call (super the-method new-data), we get back an instance of
my-simple-collection% instead of my-complex-collection% (from an IOP
perspective, we get a simple-collection<%> instead of a complex-collection<%>).

So if we construct objects in the subclass, we get no inheritance at all, and we
have to disallow hidden fields in the superclass.  If we construct objects in
the superclass, we get instances of the wrong type.  Or we can have the
superclass dynamically extract its own runtime class and construct that - but
that only works if we disallow the addition of new fields and/or initialization
arguments.

Currently this problem is "solved" by not using inheritance at all.  Tables are
implemented as separate classes which contain sets in fields; the methods of
tables explicitly call the methods of the contained sets.