Classic Java

_Classic Java_
_classic-java_

Richard Cobbe
<cobbe at ccs dot neu dot edu>

This collection is an implementation of (most of) ClassicJava, as defined
in "A Programmer's Reduction Semantics for Classes and Mixins," by Matthew
Flatt, Shriram Krishnamurthi, and Matthias Felleisen; in _Formal Syntax and
Semantics of Java_, Springer-Verlag LNCS 1523, pp. 241-269, 1999.  A
tech-report version of the paper is also available at
<http://www.ccs.neu.edu/scheme/pubs/#tr97-293>.  The implementation is
written in PLT Redex, also available through PLaneT.  Please consult that
package's documentation for further details.

The language implemented here differs from the language specified in the
paper in three ways:

    1) This implementation does not support interfaces or abstract
       methods.  (Contributions welcome!)

    2) I have added boolean and integer types, with associated primitives.
       These allow you to write interesting programs and test cases without
       having to Church-encode everything.

       I have preserved Java's distinction between class types and ground
       types.  Therefore, neither int nor bool is a subtype of Object.

    3) In the paper, the expression
            view t null
       produces a run-time error that complains of a dereferenced null.
       The equivalent expression in this implementation
            (cast t null)
       simply evaluates to null.  This latter behavior is consistent with
       Java.

This planet library serves two purposes.  First, it is a servicable (though
slow) interpreter for ClassicJava; the interface for this purpose is
defined below.  Second, it is also an extended example of how to define a
sizeable language in PLT Redex.  As such, this file also describes the
implementation in more detail than is common with PLaneT packages.

**********************************************************************
**
** Running the Interpreter
**
**********************************************************************

The main module for the executable interpreter is _classic-java.ss_.  This
module provides three functions:

> (cj-eval program)

Evaluates the given ClassicJava program and returns its result.  The next
section describes the expected syntax of the program.

The result is a human-readable s-expression of the following form:
    (((addr inst) ... ) term)
where the first list represents the final store of the program and term is
the program's final term.  Within the store, addr is a numeric address, and
inst is the object instance at that location.  Instances are represented as
follows:
    (class (class field val) ...)
where the first class is the object's class and the remaining forms list
the object's field values.  Within a given field, the class name is the
name of the class in which the field was defined; this disambiguates the
field list in the case of field shadowing.

If the evaluation is ever non-deterministic (which should not happen, if
I've defined the language correctly), then this function will signal an
error.  In that case, use cj-trace, defined below, to find the branch point(s).

> (cj-step program)

Evaluates the given ClassicJava program (in the syntax described in the
next section) and returns a list of all intermediate terms in the
evaluation, in the same form as cj-eval.  As with cj-eval, if the
evaluation is ever non-deterministic, this function signals an error.

> (cj-trace program)

Runs the supplied ClassicJava program and displays a GUI window containing
a trace of the execution(s).  Unlike the two functions above, this will
display all possible executions, even if the reduction relation is not a
function.  (Again, this shouldn't happen if I've defined the language
correctly.  If you encounter a nondeterministic reduction, please let me
know!)  Each configuration in the execution is displayed as with cj-eval
above.

**********************************************************************
**
** Input Grammar
**
**********************************************************************

The grammar in this section describes the language expected by the three
functions described above.  This implementation of ClassicJava is
case-sensitive.

ClassicJava uses the following symbols to denote reserved words in the
grammar:

    class null true false this new ref set send cast let if
    + - * == and or zero? null? not int bool

Symbols and integer literals are non-terminals in this grammar.

program     ::= (defn ... e)

defn        ::= (class cname1 cname2 (field ...) method ...)
                    ;; class cname1 extends cname2 { field ... method ... }

field       ::= (type fname)

method      ::= (type mdname (arg ...) expr)

arg         ::= (type argname)

expr        ::= null
              | integer literal
              | true
              | false
              | argname                       ;; local binding reference
              | this
              | (new cname)                   ;; new cname()
              | (ref expr fname)              ;; expr.fname
              | (set expr1 fname expr2)       ;; expr1.fname = expr2
              | (send expr1 mdname expr*)     ;; expr1.mdname(expr*)
              | (cast cname expr)             ;; (cname) expr
              | (let argname expr expr)       ;; local variable declaration
              | (bprim expr expr)             ;; binary primitive operators
              | (uprim expr)                  ;; unary primitive operators
              | (if expr1 expr2 expr3)        ;; conditional operator:
                                              ;;    expr1 ? expr2 : expr3

bprim       ::= + | - | * | == | and | or
uprim       ::= zero? | null? | not

type        ::= int | bool | cname
cname       ::= id                            ;; class name
fname       ::= id                            ;; field name
mdname      ::= id                            ;; method name
argname     ::= id                            ;; method arg/local var name

id          ::= any symbol except one of the terminals above.


Equivalence of ClassicJava and real Java forms:
===============================================

The following two classes, one in Java and one in ClassicJava, are
equivalent.

    // Java
    class Foo extends Bar {
        int x;
        bool y;
        int method(bool arg) { ... }
    }

    ;; classic Java.
    (class Foo Bar 
      ((int x) 
       (bool y))
      (int method ((bool arg)) ...))

Expressions
----------------------------------------------------------------------
  -  new Foo()                          (new Foo)
  -  obj.field                          (ref obj field)
  -  obj.field = rhs                    (set obj field rhs)
  -  obj.method(arg1, arg2)             (send obj method arg1 arg2)
  -  (Foo) obj                          (cast Foo obj)
  -  { int local_var = 3; body; }       (let local_var 3 body)
  -  text_exp ? then_exp : else_exp     (if test_exp then_exp else_exp)

And built-in binary and unary operators as listed above are expressed in
prefix notation: (+ 3 (* 4 5)) .

**********************************************************************
**
** Reduction Grammar
**
**********************************************************************

This section defines the language on which the actual reduction semantics
operates; this is also the language in which the three functions above
display their results.

id          ::= any symbol except
                    class new ref set send super this cast let
                    if + - * == and or not null? zero? int bool
                    null Object true false

class-name  ::= Object | id
binop       ::= + | - | * | ==
unop        ::= not | null? | zero?

value       ::= null | true | false
              | any numeric literal

expr        ::= value | id | this
              | (new class-name)
              | (ref expr id id)        ;; object type field
              | (set expr id id expr)   ;; object type field rhs
              | (send expr id expr ...) ;; object type args ...
              | (super expr class-name id expr ...)
              | (cast class-name expr)
              | (let id expr expr)
              | (binop expr expr)
              | (unop expr)
              | (and expr expr)
              | (or expr expr)
              | (if expr expr expr)

For more details on the actual reduction relation, please see below.

Note that the ref, set, and super expressions contain annotations as
indicated in the paper.

**********************************************************************
**
** Implementation Notes
**
**********************************************************************

This implementation of ClassicJava contains several files:

  * _ast.ss_ contains structure definitions for the abstract syntax tree.
  * _store.ss_ implements a store ADT.
  * _parser.ss_ defines the parser from the input grammar above to the AST.
  * _program.ss_ defines general operations on the program data structure.
  * _elaboration.ss_ defines the type checker and elaboration system.
  * _reduction.ss_ defines the reduction system.

Of the remaining files, test.ss and utils.ss define various utility and
helper functions; for documentation, see the files themselves.  The rest of
the files (*-tests.ss) are the SchemeUnit test suite for the ClassicJava
implementation.  Require either run-graphical-tests.ss or run-text-tests.ss
to run the test cases.

Note that this isn't really the cleanest implementation in the world; there
are several things I'd like to go back and touch up.  In the interests of
getting this out to people, though, I've deferred that effort indefinitely.

The Abstract Syntax Tree
========================
_ast.ss_ defines the structures that make up the abstract syntax tree used
during the static analysis portion of the implementation.  The only
non-straightforward part is the distinction between Src-Exprs and
Tagged-Exprs.  The former do not contain the annotations added by
elaboration (corresponding to the underlined parts of the syntax in the
paper), the latter do.

The Store ADT
=============
_store.ss_ defines a store ADT.  The implementation is straightforward,
although note that store-update takes time linear in the size of the store,
as it (functionally) replaces the address's contents with the new value
rather than simply adding the new (addr, value) pair to the front.  This
makes it much easier to produce human-readable representations of the store
for testing and debugging.

The Parser
==========
_parser.ss_ defines the parser that converts s-expressions in the input
grammar above to an AST as defined in ast.ss.  The program is represented
as a tree of classes, each of which has a reference to its superclass;
Object's superclass is #f.  To speed lookups, the classes are also in a
hash table, indexed by class name.  

Constructing this tree is made somewhat more challenging by the fact that
the language does not restrict the order of the class definitions in the
source program.  In particular, a subclass may appear before its
superclasses.  To keep the side-effects to a minimum, the initial parse
creates a temp program, in which each class contains the name of its
superclass, or #f for Object.  Then, a second pass constructs the final
tree with the desired structure.

This program does rely on the invariant that (after parsing, at least),
there is exactly one class instance in the program structure for each class
defined in the program.  That is, if A's superclass is B in program P, then
the following expression must evaluate to #t:
 
    (eq? (find-class P (class-name B)) (class-superclass A))

If this does not hold for all classes A and B, then subclass? and type<=?
will not work as expected.  (It'd be nice to relax this restriction, since
it makes the parser and elaborator more subtle without actually making
anybody else's job easier.)

The Program Operations
======================
_program.ss_ defines various operations on the entire program.  These
generally involve class, field, or method lookup and subtyping.

We organize the types into a partial order, defined by type<=? as follows:

    int <: int
    bool <: bool
    C1 <: C2    iff C1 is a subclass of C2
    any <: C    iff C <: Object

We use the "any" type (represented by instances of the any-type struct) to
indicate the type of a "null" literal during type-checking.

The Static Semantics
====================
_elaboration.ss_ defines ClassicJava's static semantics, including the
elaboration.  Because of the elaboration, this phase must construct a new
program structure, which is the same as the input, except that it contains
tagged-exprs instead of src-exprs.  As noted above, we have to be careful
in elab-class to ensure that we preserve the graph structure and thus the
invariant given above during elaboration!

The only interesting departure from the paper is the type rule for IF:

    P, G |-e e1 => e1' : bool    P, G, |-e e2 => e2' : t2
    P, G |-e e3 => e3' : t3      t = lub(t2, t3)
                    lub(t2, t3) exists
   -------------------------------------------------------
          P, G |-e if e1 e2 e3 => if e1' e2' e3' : t

where the least upper bound of two types t2 and t3 is, of course, the
smallest type t such that t2 <: t and t3 <: t.

The Dynamic Semantics
=====================
_reduction.ss_ defines ClassicJava's dynamic semantics in PLT Redex.
Reductions are of the form
    (P, S, e) -> (P, S', e')
where P is the program, S and S' are stores, and e and e' are expressions.
None of the reductions change the program, but some do need to inspect the
class definitions, as for field and method lookups.  Passing it through
each reduction seemed cleaner than making it available as a global
variable.

Because we have defined the semantics as a machine rather than a
traditional small-step semantics, we cannot use the reduction/context
convenience macro but instead must manage the contexts directly.

In theory, it should be possible to define contexts in a
slightly different way and use reduction/context:

    context ::= (program store term-context)
    term-context ::= hole | (ref term-context id id) |
                   | (set context id id expr) | ...

However, this would work only for those reductions that do not manipulate
the store at all: call, let, cast, uprim, bprim, and the boolean
operators.  So it's not clear that there's any real benefit to doing that.

======================================================================

ClassicJava: an implementation of the ClassicJava programming language
Copyright (C) 2005  Richard Cobbe

This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this library; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA