Version: 4.2.0.3
Functional Relational Algebra
Jay McCarthy <jay at plt-scheme dot org>
This package provides a purely functional implementation of Relational Algebra in PLT Scheme.
1 Examples
In this example, we will build a relational database for a university grade book. First, we define a database with a few relations:
(define GradebookDB |
(Database |
[Students |
[StudentId Name Course] |
(0 "Jonah" 'CS330) |
(1 "Aidan" 'CS142) |
(2 "Sarah" 'CS630)] |
[Projects |
[ProjectId Course Title] |
(0 'CS330 "Garbage Collectors") |
(1 'CS330 "Prolog") |
(2 'CS142 "UFO") |
(3 'CS630 "Theorem Prover")] |
[Grades |
[StudentId ProjectId Grade] |
[0 0 2/3] |
[1 2 99] |
[2 3 -inf.0]])) |
At this point GradebookDB is bound to a value obeying database/c that contains three relations: 'Students, 'Projects, and 'Grades. The first S-expression after each relation identifier is the schema/c for that relation. Each S-expression after that is a Tuple that is in the relation. As you can see, any Scheme value can be included in a tuple.
Let’s do some queries!
> (with-database GradebookDB | (print-relation | (execute-query | (query-relation 'Students)))) |
|
StudentId Name Course | 2 Sarah CS630 | 1 Aidan CS142 | 0 Jonah CS330 |
|
As you can see from this interaction, a relation is just a set of tuples, which are a vector-like abstraction. Now for some more interesting queries:
> (with-database GradebookDB | (print-relation | (execute-query | (query-selection | (Proposition (>30 Grade)) | (query-relation 'Grades))))) |
|
StudentId ProjectId Grade | 0 0 2/3 | 2 3 -inf.0 |
|
Proposition can be any Scheme value that may be applied.
Suppose that we attempted to use that proposition on a relation that did not have 'Grade in its schema?
> (with-database GradebookDB | (query-selection | (Proposition (>30 Grade)) | (query-relation 'Students))) |
|
query-selection: Proposition must refer to subset of |
query's attributes: >30((Grade)) |
As you can see, the error is detected before the query is ever run.
Now, let’s have some joins:
> (with-database GradebookDB | (print-relation | (execute-query | (query-rename | 'Title 'Project | (query-projection | '(Name Course Title Grade) | (query-natural-join | (query-relation 'Projects) | (query-natural-join | (query-relation 'Grades) | (query-relation 'Students)))))))) |
|
Name Course Project Grade | Jonah CS330 Garbage Collectors 2/3 | Aidan CS142 UFO 99 | Sarah CS630 Theorem Prover -inf.0 |
|
Finally, some functional database modification:
> (with-database GradebookDB | (print-relation | (execute-query (query-relation 'Students)))) |
|
StudentId Name Course | 2 Sarah CS630 | 1 Aidan CS142 | 0 Jonah CS330 |
|
> (define GradebookDB+1 | (database-insert | GradebookDB 'Students | (Tuple 3 "Omega" (lambda () ((lambda (x) (x x)) (lambda (x) (x x))))))) |
|
> (with-database GradebookDB+1 | (print-relation | (execute-query (query-relation 'Students)))) |
|
StudentId Name Course | 3 Omega #<procedure> | 2 Sarah CS630 | 1 Aidan CS142 | 0 Jonah CS330 |
|
> (define GradebookDB+2 | (database-delete | GradebookDB+1 'Students | (Tuple 0 "Jonah" 'CS330))) |
|
> (with-database GradebookDB+2 | (print-relation | (execute-query (query-relation 'Students)))) |
|
StudentId Name Course | 1 Aidan CS142 | 2 Sarah CS630 | 3 Omega #<procedure> |
|
2 API
This section documents the APIs of the package.
2.1 Schemas
Schemas are defined as lists of symbols.
Equivalent to (listof symbol?).
2.2 Propositions
Propositions are used by query-selection to compute sub-relations.
Returns #t if v is a proposition, #f otherwise.
(Proposition p) |
|
p | | = | | (not p) | | | | | | (and p p) | | | | | | (or p p) | | | | | | (proc attribute ...) |
|
|
|
Returns a proposition. The interpretation of not, and, and or is standard.
When a procedure is included in a proposition, the values of the named attributes are extracted from the tuple
and applied to the procedure value; if it returns #t, then the tuple is selected, otherwise it is rejected.
2.3 Queries
Queries are used by execute-query to run relational queries.
Returns #t if v is a query, #f otherwise.
Equivalent to (-> symbol? schema/c).
current-database-schema : (parameter/c database-schema/c) |
Used by query-schema to determine the schema of query-relation queries.
(query-schema q) → schema/c |
q : query? |
Returns the schema of the relation q computes.
(query-relation rid) → query? |
rid : symbol? |
Query of the relation rid.
(query-union q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-difference q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-intersection q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-product q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-projection s q) → query? | s : schema/c | q : query? |
|
(query-selection p q) → query? | p : prop? | q : query? |
|
(query-rename old-attr new-attr q) → query? | old-attr : symbol? | new-attr : symbol? | q : query? |
|
(query-rename* renaming q) → query? | renaming : (dict/c symbol? symbol?) | q : query? |
|
(query-natural-join q1 q2 [equal?]) → query? | q1 : query? | q2 : query? | equal? : (any/c any/c . -> . boolean?) = equal? |
|
(query-theta-join p q1 q2) → query? | p : prop? | q1 : query? | q2 : query? |
|
(query-semi-join q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-anti-join q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-division q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-left-outer-join q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-right-outer-join q1 q2) → query? | q1 : query? | q2 : query? |
|
(query-outer-join q1 q2) → query? | q1 : query? | q2 : query? |
|
These construct queries represent the basic operations of relational algebra.
query-rename* applies multiple renamings at once using the dictionary to map old names to new names.
query-natural-join takes an optional equal? argument to compare attribute values for equality.
2.4 Tuples
Tuples are the contents of relations.
Returns #t if v is a tuple, #f otherwise.
Returns the length of t.
Returns the ith element of t.
Returns a tuple that contains all the elems.
2.5 Relations
Relations are the contents of databases and the results of queries.
Returns #t if v is a relation, #f otherwise.
(relation-schema r) → schema/c |
r : relation? |
Returns r’s schema.
(relation-tuples r) → (set? tuple?) |
r : relation? |
Returns the set of tuples comprising the relation r.
(Relation [attribute ...] | (elem ...) | ...) |
|
|
|
Returns a relation with '(attribute ...) as its schema that contains each (Tuple elem ...) as its tuples.
2.6 Database
Equivalent to (and/c immutable? (hash-eq? symbol? relation?)).
(database-insert db rid t) → database/c |
db : database/c |
rid : symbol? |
t : tuple? |
Returns a database that is identical to db, except t is in the relation rid.
(database-delete db rid t) → database/c |
db : database/c |
rid : symbol? |
t : tuple? |
Returns a database that is identical to db, except t is not in the relation rid.
Executes (begin e ...) with db as the current database.
(execute-query q) → relation? |
q : query? |
Computes the relation specified by q using the current database (chosen by with-database).
The NULL value that is inserted by the evaluation of query-left-outer-join, query-right-outer-join, or query-outer-join.
(Database | [relation | [attribute ...] | (elem ...) | ...] | ...) |
|
|
|
Returns a database with each 'relation specified as
3 Implementation Notes
The current implementation uses immutable hashes as relations, vectors as tuples (except that they can be efficient appended), and lists as schemas.
Propositions are structures, but are compiled to procedures (with attribute identifiers resolved to tuple positions) prior to query execution.
execute-query uses a simple query optimizer.
It has two passes: first it tries to push selection operations to the leaves of the query to reduce relation sizes prior to products, then it pulls selection operations towards the root (but not passed products) to reduce the number of iterations over all the elements of a tuple. During both passes, some simplifications are performed, such as merging adjacent renamings, projections, and identical (or trivial) selections. This optimization happens independent of any statistics about relation sizes, etc.