#lang scribble/doc
@(require 
   scribble/manual
   scribble/eval
   scheme/sandbox)

@(define scat-eval (make-base-eval))
@(define-syntax ex
   (syntax-rules ()
     ((_ () . args)
      (interaction #:eval scat-eval . args))
     ((_ . args)
      (defs+int #:eval scat-eval . args))))

@title{Staapl reference guide}

This document serves as a reference for core Staapl forms, but can be
read as a tutorial.  We start with Scat, a concatenative and
compositional language with threaded state, and move onwards to Coma
and its composition and primitive definition mechanisms.  Then we work
our way up through Forth control words, prefix parsing words and the
PIC18 backend.  

@section{Representation Language}

All concatenative code is built on top of a concatenative and
compositional language called Scat with extensible syntax and
semantics.  The term @emph{concatenative} refers to program syntax
being build from concatenation of subprogram names.  The term
@emph{compositional} refers to program semantics where concatenation
is interpreted as composition of named functions.

Like the Joy language, the meaning function is a homomorphism from the
syntactic monoid onto the semantic monoid. That is, the syntactic
relation of concatenation of identifiers maps directly onto the semantic
relation of composition of functions. It is a homomorphism instead of
an isomorphism because it is onto but not one-to-one, that is, some
sequences of identifiers have the same meaning (i.e. @scheme[dup +] and
@scheme[2 *]) but no identifier has more than one meaning.


To load Scat use:

@ex[()(require (planet zwizwa/staapl/scat))]

A @scheme[scat:] expression composes Scats functions.

@specform[(scat: word ...)]{}

The @scheme[_word]s refer to Scat functions in the @scheme[(scat)]
namespace.  If there are no arguments, the identity function is
returned.  

A @scheme[scat>] expression can be used to test Scat compositions
interactively.

@specform[(scat> word ...)]{}

This constructs a function by passing the @scheme[_word]s to
@scheme[scat:], and prints the result of applying this function to an
empty parameter stack, top element rightmost.

@ex[() 
(scat> 1 2)
(scat> 1 2 +)
]

These forms support the @scheme[unquote] operation to introduce scheme
expressions into Scat code.  This can be done in two ways.  Non-quoted
unquote is interpreted as Scat functions, while quoted unquotes are
interpreted as literal values.  Note that otherwize identifiers in
Scat code @emph{always} represent functions.  Literal values occuring
in code are a notational shorthand for functions that push a value to
the parameter stack.

@ex[()
(let ([fn (scat: 1 +)]
      [val 123])
  (scat> 1 ,fn)
  (scat> ',val))
]


New functions can be defined using the @scheme[define-ns] form.

@specform[(define-ns namespace name value)]{}

The @scheme[_namespace] refers to a list of prefixes for Scat
identifiers when represented as a Scheme identifier, @scheme[_name] is
the base identifier, and @scheme[_value] is the function value. I.e.:

@ex[()
(define-ns (scat) foo (scat: 1 2 3))
(define-ns (scat) bar (scat: + +))
(scat> foo)
(scat> foo bar)
]

There is a shorthand @scheme[compositions] for defining multiple
words.

@specform[(compositions namespace composer (name word ...) ...)]

The @scheme[_namespace] parameter defines the @emph{destination}
namespace of new definitions.  The @emph{source} namespace is encoded
in the @scheme[_composer]. I.e. @scheme[scat:] will pick from the
@scheme[(scat)] namespace.  This allows the definition of different
primitive Scat semantics by taking functionality from one namespace to
define primitives of another.  The equivalent of the previous example
is:

@ex[()
(compositions (scat) scat:
  (foo  1 2 3)
  (bar  + +))
]


The primitives available in the Scat namespace are currently not
documented.  They are largely compatible with Joy or directly snarfed
from Scheme.  These primitives are not so important for general use,
since we mostly employ a different semantics for Scat called Coma,
a concatenative macro language.


@section{Concatenative Macros}

@ex[() (require (planet zwizwa/staapl/coma))]

The Coma language is a Scat-like language residing in the
@scheme[(macro)] namespace.  It can be accessed using the
@scheme[macro:] and @scheme[macro>] forms analogous to the
@scheme[scat:] and @scheme[scat>] forms in the previous section.

@specform[(macro: word ...)]{}
@specform[(macro> word ...)]{}

Coma's values are lists of @emph{stack machine} instructions,
transformed by its operations.  The transformations are always local
to the end of the list, so the list of instructions behaves as a
stack.  Coma is thus a stack language serving as @emph{meta language}
for another stack language.

In its most basic form, the stack machine has two instructions. The
@scheme[qw] (Quote Word) instruction loads a literal (number) from the
instruction stream onto its parameter stack, while the @scheme[cw]
(Call Word) instruction performs a procedure call to a named function.
The Coma language performs @emph{partial evaluation} on the
@scheme[qw] instructions, but leaves the abstract @scheme[cw]
instructions alone.

@ex[()
(macro> 1 2)
(macro> 1 2 +)
(macro> 123 345 10 3 -)
]

The Coma composition mechanism is the same as Scat's. The difference
of the two languages lies in their primitives.  When writing a
compiler for a particular architecture, the task is to define the
semantics of the primitives in terms of target code transformers.
This can done using the form @scheme[patterns], a pattern matching
mechanism for stack machine code.

@defform/subs[(patterns namespace (lhs rhs) ...)
              ([lhs ([opcode operand ...] ... name)]
               [rhs ([opcode operand ...] ...)
                    expr])]

Each clause @scheme[(_lhs _rhs)] defines how the Coma macro
@scheme[_name] transforms a particular pattern of instructions
@scheme[([opcode operand ...]  ...)] on the top of the code stack.
For each unique name, the corresponding patterns are grouped and tried
from top to bottom until there's a match.  The @scheme[_opcode]s in
@scheme[_lhs] are matched literally as symbols, while the
@scheme[operand]s bind variables in the tagged lists representing
instructions.  The @scheme[_namespace] is most likely
@scheme[(macro)].

If @scheme[_rhs] is a list of lists, the macro will replace the
matched instructions with a list of template instructions with the
@scheme[_opcode]s quoted literally and the @scheme[_operand]s
evaluated. If @scheme[_rhs] is not a list of lists, it is a Scheme
expression that evaluates to a macro which is subsequentially
evaluated on the code stack with the matched instructions removed.


For example, Let's redefine the Coma macro @scheme[+] to use the PIC18
instructions @scheme[addlw].

@ex[()
(patterns (macro)
  (([qw a] [qw b] +) ([qw (+ a b)]))
  (([qw a] +)        ([addlw a])))
(macro> 1 2 +)
(macro> 123 +)
]

On top of the @scheme[+] just defined, we define a macro @scheme[-]
using macro delegation.  The second pattern doesn't produce a macro
but throws an exception.

@ex[()
(patterns (macro)
  (([qw a] -)        (macro: ',(* -1 a) +))
  ((-)               (error 'not-implemented)))
(macro> 10 3 -)
(macro> -)
]

Observe that @scheme[(macro> 1 2 +)] is now different than before.  We
used direct evaluation in the expression @scheme[(+ a b)].  However,
it is best to use the @scheme[tscat:] form for this.

@specform[(tscat: word ...)]

This is like @scheme[scat:] in that the @scheme[_word]s come from the
@scheme[scat] namespace, but it is different in two ways.  If
@scheme[_word] is present in the lexical context, it is interpolated
as a quoted literal value. In the case above, the expression
@scheme[(tscat: a b +)] is equivalent to @scheme[(scat: ',a ',b +)].
Additionally, the computation is wrapped to be evaluated at assembly
time.  This allows code addresses that are only available at assembly
time to be used in partial evaluation.  The previous example can now
be adapted to:

@ex[()
(patterns (macro)
  (([qw a] [qw b] +) ([qw (tscat: a b +)]))
  (([qw a] +)        ([addlw a])))
(macro> 1 2 +)
(macro> 123 +)
]

Beyond the first level of deconstruction that @scheme[patterns]
performs, enforcing primitives to process lists of opcode . operands
pairs locally as a stack, the pattern matching syntax is PLT Scheme's
@scheme[match] from @scheme[scheme/match] and can be used to further
deconstructs operands. This allows the processing of arbitrary data
structures in intermediate code and is the substrate of compile time
computation in Staapl.  One creates primitive code generators and
processors that can be used as components in low-level programs
written in a Forth-like compositional language.

@ex[()
(define-struct vec (x y))
(patterns (macro)
  (([qw x] [qw y] vpack)               ([qw (make-vec x y)]))
  (([qw (struct vec (x y))] vunpack)   ([qw x] [qw y]))
  (([qw (struct vec (x y))] vswap)     ([qw (make-vec y x)])))
(macro> 1 2 vpack)
(macro> 1 2 vpack vswap vunpack)
]

Opcodes are just symbolic tags and can be used inside a
@scheme[patterns] form without the need for declaration.  The only
restriction is that stack machine code produced by a composition of
macros can be processed by the stack machine assembler.  This freedom
allows the introduction of macros that serve as primitives for
arbitrary code generation @emph{idioms}, by allowing arbitrary data to
be passed on the code stack in the form of @emph{compile time data},
represented by the @scheme['qw] tag, or arbitrarily tagged
@emph{pseudo instructions}.

Instead of @scheme[[qw (struct vec (x y))]] we could have introduced
the opcode @scheme['vec] using something like @scheme[[vec v]] or
@scheme[[vec (struct vec (x y))]]. Doing so prevents manipulation of
such objects by generic stack manipulation words like @scheme[dup] and
@scheme[drop].  The general rule is: use @scheme['qw] for anything
that rougly behaves as a target machine value, and use anything else
to prevent the partial evaluator for @scheme['qw] tags to touch the
instruction.

For certain classes of operations one can abstract the possible
patterns over the instances of the class. To this end the macro
@scheme[patterns-class] can be used.

@specform[(patterns namespace formals (actuals ...) clause ...)]

The @scheme[_formals] parameter is a list containing identifiers that
parameterize the @scheme[_clause]s, which are pattern clauses passed
to the @scheme[patterns] form.  The @scheme[_actuals] parameters are
values to be filled in for the formal parameters.  

The following example defines pattern matching rules for unary
operators on the Microchip PIC18 architecture.  Note that the first
clause implements peephole optimization: it combines a real machine
instruction with the effect of a macro.

@ex[()
(patterns-class (macro)
    [word         opcode]
    [(1+          incf)
     (1-          decf)
     (rot<<c      rlcf)
     (rot>>c      rrcf)
     (rot<<       rlncf)
     (rot>>       rrncf)
     (swap-nibble swapf)]
 (([movf f 0 0] word) ([opcode f 0 0]))
 ((word)              ([opcode WREG 0 0])))
]

@section{Prefix Macros}

FIXME: Factor out instantiation in Scheme, then write the prefix Forth
syntax on top of that.

Coma is a non-reflective meta language: it processes stack machine code
only.  In order to be able to abstract over @emph{names} of macros, a
different mechanism is necessary.  In Staapl this mechanism is called
@emph{prefix macros}, and it is implemented as a preprocessing step
defined in terms of a number of primitive @emph{special forms} based
on Forth's @emph{parsing words}.  The primitive prefix forms are
@scheme[create : forth macro require load path].

@section{PIC18 Forth}

The Microchip PIC18 microcontroller architecture with 8-bit data word
size and 16-bit instruction word size maps quite straightforwardly to
a stack machine, making it an ideal primary architecture for Staapl.
The implementation in Staapl uses the WREG register to implement the
top of stack register, and INDF0 to point to the rest of the parameter
stack in SRAM.  Most arithmetic and conditional jump instructions are
used to implement basic forth data and control operations in a fairly
optimal way.  The compiler produces native code, eliminating the need
for an on-target interpreter.  Forth code can be used in interrupt
routines without trouble.  The PIC18 language includes a full RPN
assembler.  The PIC18F stores code in Flash ROM memory, but has the
ability to program itself.  This is used to provide run-time code
upload in the interaction mode.

@section{Interaction}

Coma is a cross compiled language. In order not to loose the ability
to interactively probe a running system, one of the virtues of Forth,
this functionality is @emph{simulated} in the form of an interactive
console requiring minimal on-target support.