---------------------------------------------------------------------- _DATA STRUCTURES GALORE_ Jens Axel Søgaard ---------------------------------------------------------------------- ---------------------------------------------------------------------- INTRODUCTION ---------------------------------------------------------------------- This library provides functional implementations of some commonly used data structures. The data structures can be divided into: - collections (_sets_, _bags_, _heaps_) - associative collections (_finite maps_, _priority queues_) - sequences (_stacks_, _queues_, _deques_) [Although associative collections and sequences are included in the PLanet-distribution, their implementations, though in working order, should not be considered finished.] The data structures are implemented in a purely functional way. Inserting an element in, say, a collection will return a new data structure containing both the element and all elements in the old collection *without* destroying the old collection. The collections can thus be used *persistently* opposed to the normal *ephemeral* way, where an old collection is destroyed. This implies that the collection is safe to use in multi threaded programs and web servlets. Data structures can have multiple implementations, and each implementation has its own module. A "no-fuss" module exists for each data strucuture containing a reasonable default implemention for each data structure is given - for those cases, where it is unimportant to have specific time complexities. The names of the individual operations have been carefully chosen to have meaning for as many data structures as possible. Element insertion is for example named "insert" in all data structures (some data structures also provide alternative, more traditional names). This choice makes it easy to replace an unfortunate choice of data structure. Use (prefix ...) to avoid name conflicts working with more than one data structure. Many of the data structure implementations require an ordering of the elements. The order is represented using compare function from srfi-67. All exported functions are checked using contracts, the error messages are therefore considerably better than in the previous version of this library. Most algorithms are carefully explained by Chris Okasaki in the delightful book "Purely Functional Data Structures". [Terminology: bags are also called multi-sets; a heap is a priority queue, where the priority is the element; in a priority queue the element and priority are distinct; a deque is a double ended queue] ---------------------------------------------------------------------- INITIAL EXAMPLES ---------------------------------------------------------------------- EXAMPLE (Simple) ---------------- The operation empty makes a new heap with no elements. ;; A heap of integers > (require (planet "heap.scm" ("soegaard" "galore.plt"))) > (find-min (insert 42 (insert 3 (heap 2 8 1)))) 1 ;; A leftist heap of strings > (require (planet "leftist-heap.scm" ("soegaard" "galore.plt"))) > (find-min (insert "foo" (insert "bar" (empty)))) "bar" ;; Bags of integers > (require (planet "bag.scm" ("soegaard" "galore.plt"))) > (elements (difference (union (bag 1 3 2) (bag 1 2 4)) (bag 3))) (4 2 2 1 1) ;; Sets of integers (require (planet "set.scm" ("soegaard" "galore.plt"))) > (elements (difference (union (set 1 3 2) (set 1 2 4)) (set 3))) (1 2 4) EXAMPLE (Heap Sort) ------------------- A simple heap sort: (require (planet "heap.scm" ("soegaard" "galore.plt"))) (define (heap-sort l) (heap->list (list->heap l))) (define (list->heap l) ; a more efficient list->heap is provided by the library (foldl insert (empty) l)) (define (heap->list H) (define (loop H l) (if (empty? H) (reverse l) (loop (delete-min H) (cons (find-min H) l)))) (loop H '())) (heap-sort '(3 1 4 1 5 9)) Eager comprehensions from srfi-42 are supported, so an alternative to the above is: (require (planet "heap.scm" ("soegaard" "galore.plt")) (lib "42.ss" "srfi") (lib "67.ss" "srfi")) (define (heap-sort l) (heap->list (list->heap l))) (define (list->heap l) (heap-ec default-compare (: x l) x)) (define (heap->list H) (list-ec (: x H) x)) (heap-sort '(3 1 4 1 5 9)) EXAMPLE (User defined element type) ----------------------------------- The above heaps used the compare function given by the parameter current-compare to order the elements (because empty was given no arguments). The start value of current-compare is default-compare, which handles lists, booleans, characters, strings, symbols, numbers and vectors. To create a heap of user defined structs, we need to pass empty a custom compare function. (require (planet "heap.scm" ("soegaard" "galore.plt")) (lib "67.ss" "srfi")) (define-struct hiscore (name score) (make-inspector)) (define (hiscore-compare s1 s2) (refine-compare ; highest scores first (number-compare (hiscore-score s2) (hiscore-score s1)) ; ties are sorted alphabetically after name (string-compare (hiscore-name s1) (hiscore-name s2)))) (find-min (insert* (list (make-hiscore "Foo" 200) (make-hiscore "Bar" 100) (make-hiscore "Qux" 300)) (empty hiscore-compare))) ; => #(struct:hiscore "Qux" 300) As an alternative one could extend the compare function given by the current-compare parameter: (let ([cmp (current-compare)]) (current-compare (lambda (x1 x2) (select-compare x1 x2 [hiscore? (hiscore-compare x1 x2)] [else (cmp x1 x2)])))) (find-min (insert* (list (make-hiscore "Foo" 200) (make-hiscore "Bar" 100) (make-hiscore "Qux" 300)) (empty))) ; => #(struct:hiscore "Qux" 300) ---------------------------------------------------------------------- COLLECTIONS ---------------------------------------------------------------------- Collections include sets, bags, and heaps. The following is a summary of the operations collections have in common: CONSTRUCTORS ------------ > empty : [cmp] -> col return an object representing an empty X of elements order by the compare function cmp, or the value of the parameter current-compare. > insert : elm col -> col (insert x C) = {x} U C > insert* : (list elm ...) col -> col (insert xs c) = C U {x1, ...} , where xs = (list x1 ...) > list->set : [cmp] (list elm) -> col > list->bag : [cmp] (list elm) -> col > list->heap : [cmp] (list elm) -> col (list->set xs) = (insert* xs (empty)) (list->set cmp xs) = (insert* xs (empty cmp)) and similar for the others > singleton : [cmp] elm -> col (singleton x) = (insert x (empty)) (singleton cmp x) = (insert x (empty cmp)) > union : col col -> col x in (union A B) <=> x in A or x in B Futhermore shorthand for (insert* (list x1 ...) (empty)) is provided in terms of: > bag : (list elm) -> bag > heap : (list elm) -> heap > set : (list elm) -> set DESTRUCTORS ----------- > elements : col -> (list elm) Returns a list of all occurrences of all elements in the collection. > find-min : col -> elm The call (find-min C) returns an element x such that x <= y for all y in C, where <= is determined by C's compare function. It is an error to use find-min on an empty collection. > get : elm col -> elm return an element from col equivalent to the given element, or #f if no such element is found > select : col -> elm (select C) = x <=> x in C Selects an element in the collection. It is an error to pass an empty collecion to select. DELETIONS --------- > delete : elm col -> col delete one occurence of the element from the collection > delete-all : elm col -> col delete all occurence of the element from the collection > delete* : (list elm) col -> col for each element in the list, delete one occurence of the element from the collction > delete-min : col -> col delete one occurence of the minimal element from the collection OBSERVERS --------- > count : elm col -> natural-number counts the number of occurences of the element in the collection > empty? : col -> boolean returns #t if the collection is empty, otherwise #f is returned > member? : elm col -> boolean returns #t if the collection contains an occurrence of the element > size : col -> natural-number returns the number of elements in the collection FOLDS ----- > fold : (elm alpha -> alpha) alpha col -> alpha Let the collection c contain the elements x1, x2, ..., xn, then (fold kons knil c) = (kons xn ... (kons x2 (kons x1 knil)) ... ) TYPE PREDICATES --------------- > set? : object -> boolean > bag? : object -> boolean > heap? : object -> boolean Returns #t, if the object is a collection of the proper type. EAGER COMPREHENSIONS -------------------- Comprehensions - - - - - - - > (set-ec cmp <qualifier>* <expression>) > (bag-ec cmp <qualifier>* <expression>) > (heap-ec cmp <qualifier>* <expression>) The collection of values obtained by evaluating <expression> once for each binding in the sequence defined by the qualifiers. If there are no qualifiers the result is the list with the value of <expression>. The first argument cmp is the compare function used to order the elements of the collection. One has the following identity for set-ec (the actual implementation is more efficent): (set-ec cmp <qualifier>* <expression>) = (list->set cmp (list-ec <qualifier>* <expression>)) Similar identities hold for bag-ec and heap-ec. Generators - - - - - The generators :set, :bag, and :heap are defined. Example - - - - > (elements (set-ec default-compare (:range i -5 5) (* i i))) (0 1 4 9 16 25) > (list-ec (: i (set 1 2 2 3)) i) (1 2 3) ---------------------------------------------------------------------- HEAPS ---------------------------------------------------------------------- Heaps provide efficient access to the minimum element. Currently one heap implementation is provided namely leftist heaps, which is therefore also is the default heap implementation. (require (planet "heap.scm" ("soegaard" "galore.plt"))) (require (planet "leftist-heap.scm" ("soegaard" "galore.plt"))) ---------------------------------------------------------------------- SETS ---------------------------------------------------------------------- Besides the common collection operations sets also provide the customary set operations. Most set implementations provide efficient access to all elements. There are two set implementations. One represents sets using red/black trees, the uses sorted lists. The default set implementation is based on red/black trees. (require (planet "set.scm" ("soegaard" "galore.plt"))) (require (planet "red-black-tree-set.scm" ("soegaard" "galore.plt"))) (require (planet "list-set.scm" ("soegaard" "galore.plt"))) Set operations - - - - - - - - Besides the common operations sets also support the following operations: > difference : set set -> set (difference A B) = {x in A | x not-in B} > equal=? : set set -> boolean The call (equal=? A B) returns true, if all elements of A are members of B, and all elements of B are members of A. > intersection : set set -> set The call (intersection A B) returns the set of elements that are members of both A and B. If x in A and y in B represents the same element with respect to the compare function in question, it is unspecified whether x or y is returned. (Use intersection/combiner if you need a specific choice). > set : elm ... -> set (set x1 x2 ... xn) = (insert* (list x1 x2 ... xn) (empty)) > subset? : set set -> boolean The call (subset? A B) returns #t if all elements of A are members of B, otherwise #f is returned. Sets support variations of the normal operations, that allows you to control which element to keep in the case of duplicates. A combiner is function of two arguments that receive two equivalent (with respect to the compare function in question) and return the element that should be kept. The supported combiner operations are: > insert/combiner (insert with combiner), > insert*/combiner > intersection/combiner > list->set/combiner > union/combiner E.g. (insert*/combiner (list 1 2 3 4) (empty compare-mod2) max) will result in the set {3, 4}. The combiner is called on the pairs 1 and 3, and on 2 and 4. > intersection/combiner : set set combiner -> set Return the intersection of two sets. If x in A and y in B represent the same element, then the element returned by the call (c x y), where c is the combiner, is used as the representative. ---------------------------------------------------------------------- BAGS ---------------------------------------------------------------------- Bags (or multi-sets) keep a count of the number of occurences of each element. Some bag implementations hold each individual element inserted in the bag - the implementation provided by this library keeps only one representative together with a count of how many times it were inserted. There are two bag implementations. One represents bags using red/black trees, the uses sorted lists. The default bag implementation is based on red/black trees. (require (planet "bag.scm" ("soegaard" "galore.plt"))) (require (planet "red-black-tree-bag.scm" ("soegaard" "galore.plt"))) (require (planet "list-bag.scm" ("soegaard" "galore.plt"))) Bag Operations - - - - - - - - Besides the common operations bags also support the following operations: > bag : elm ... -> bag (bag x1 x2 ... xn) = (insert* (list x1 x2 ... xn) (empty)) > difference : bag bag -> bag (difference A B) = (fold delete A B) > equal=? : bag bag -> boolean The call (equal=? A B) returns true, if A is a subbag of B and B is a subbag of A (see subbag?) > fold/no : (elm number alpha -> alpha) alpha bag -> alpha Like fold, but the combining function takes 3 arguments: an element, the number of occurences of the element, and the accumulator. > intersection : bag bag -> bag The call (intersection A B) returns a bag of elements that are members of both A and B. The number of occurences of an element is the least of the number of occurences in A and B. If x in A and y in B represents the same element with respect to the compare function in question, it is unspecified whether x or y is returned. (Use intersection/combiner if you need a specific choice). > subbag? : bag bag -> boolean The call (subbag? A B) returns #t if all elements of A are members of B and the number of occurences of an element in A is less or equal to the number of occurences of the element in B, otherwise #f is returned. Bags support the following combiner operations: > insert/combiner > insert*/combiner > intersection/combiner > list->set/combiner > union/combiner ---------------------------------------------------------------------- TIME COMPLEXITIES ---------------------------------------------------------------------- The following table shows the worst case time complixity. The O( ) has been omitted due to space considerations. RB-Set RB-Bag Leftist-Heaps List-Set List-Bag find-min 1 1 1 1 1 delete-min 1 1 1 1 1 get, member? log n log n n n n insert log n log n log n n n union n+m n+m log(n+m) n+m n+m elements n n n 1 n delete log n log n log n n n delete-all log n log n log n n n size n n n n n bag, list->set n log(n) n log(n) heap, list->bag n log(n) n log(n) set, list->heap n The operations: empty?, empty, singleton, and select are all O(1). ---------------------------------------------------------------------- EXAMPLES ---------------------------------------------------------------------- In the folder 'examples' you will find the following examples: heap-sort.scm -- Heap example queens.scm -- Heap example knights-tour.scm -- Priority queue example primes.scm -- Priority queue example greedy-set-cover.scm -- Set example matching-parenthesis.scm -- Stack Example ---------------------------------------------------------------------- DESIGN RATIONALE ---------------------------------------------------------------------- Argument order -------------- The argument order was chosen to resemble the order used for list operations. (cons x xs) (insert x xs) (car xs) (select xs) (cdr xs) (delete-min xs) (append xs ys) (insert* xs ys) or (union xs ys) (delete x xs) (delete x xs) ; see srfi-1 The above choice makes it easy to use e.g. insert and delete in folds. ---------------------------------------------------------------------- ACKNOWLEDGEMENTS ---------------------------------------------------------------------- - The red-black trees implementation started as a port of Jean-Christophe Filliatre's Ocaml implementation. - Guillaume Marceau ---------------------------------------------------------------------- REFERENCES ---------------------------------------------------------------------- [Ada] Adams, Stepehn, "Implementing Sets Efficiently in a Functional Language" [Mar] Martinez, Conrado and Roura, Salvador: "Randomized Binary Search Trees" [Oka] Chris Okasaki, "Purely Functional Data Structures"