#lang scribble/doc @(require scribble/manual scribble/eval scheme/sandbox) @(define scat-eval (make-base-eval)) @(define-syntax ex (syntax-rules () ((_ () . args) (interaction #:eval scat-eval . args)) ((_ . args) (defs+int #:eval scat-eval . args)))) @title{Staapl reference guide} This document serves as a reference for core Staapl forms, but can be read as a tutorial. We start with Scat, a concatenative and compositional language with threaded state, and move onwards to Coma and its composition and primitive definition mechanisms. Then we work our way up through Forth control words, prefix parsing words and the PIC18 backend. @section{Representation Language} All concatenative code is built on top of a concatenative and compositional language called Scat with extensible syntax and semantics. The term @emph{concatenative} refers to program syntax being build from concatenation of subprogram names. The term @emph{compositional} refers to program semantics where concatenation is interpreted as composition of named functions. Like the Joy language, the meaning function is a homomorphism from the syntactic monoid onto the semantic monoid. That is, the syntactic relation of concatenation of identifiers maps directly onto the semantic relation of composition of functions. It is a homomorphism instead of an isomorphism because it is onto but not one-to-one, that is, some sequences of identifiers have the same meaning (i.e. @scheme[dup +] and @scheme[2 *]) but no identifier has more than one meaning. To load Scat use: @ex[()(require (planet zwizwa/staapl/scat))] A @scheme[scat:] expression composes Scats functions. @specform[(scat: word ...)]{} The @scheme[_word]s refer to Scat functions in the @scheme[(scat)] namespace. If there are no arguments, the identity function is returned. A @scheme[scat>] expression can be used to test Scat compositions interactively. @specform[(scat> word ...)]{} This constructs a function by passing the @scheme[_word]s to @scheme[scat:], and prints the result of applying this function to an empty parameter stack, top element rightmost. @ex[() (scat> 1 2) (scat> 1 2 +) ] These forms support the @scheme[unquote] operation to introduce scheme expressions into Scat code. This can be done in two ways. Non-quoted unquote is interpreted as Scat functions, while quoted unquotes are interpreted as literal values. Note that otherwize identifiers in Scat code @emph{always} represent functions. Literal values occuring in code are a notational shorthand for functions that push a value to the parameter stack. @ex[() (let ([fn (scat: 1 +)] [val 123]) (scat> 1 ,fn) (scat> ',val)) ] New functions can be defined using the @scheme[define-ns] form. @specform[(define-ns namespace name value)]{} The @scheme[_namespace] refers to a list of prefixes for Scat identifiers when represented as a Scheme identifier, @scheme[_name] is the base identifier, and @scheme[_value] is the function value. I.e.: @ex[() (define-ns (scat) foo (scat: 1 2 3)) (define-ns (scat) bar (scat: + +)) (scat> foo) (scat> foo bar) ] There is a shorthand @scheme[compositions] for defining multiple words. @specform[(compositions namespace composer (name word ...) ...)] The @scheme[_namespace] parameter defines the @emph{destination} namespace of new definitions. The @emph{source} namespace is encoded in the @scheme[_composer]. I.e. @scheme[scat:] will pick from the @scheme[(scat)] namespace. This allows the definition of different primitive Scat semantics by taking functionality from one namespace to define primitives of another. The equivalent of the previous example is: @ex[() (compositions (scat) scat: (foo 1 2 3) (bar + +)) ] The primitives available in the Scat namespace are currently not documented. They are largely compatible with Joy or directly snarfed from Scheme. These primitives are not so important for general use, since we mostly employ a different semantics for Scat called Coma, a concatenative macro language. @section{Concatenative Macros} @ex[() (require (planet zwizwa/staapl/coma))] The Coma language is a Scat-like language residing in the @scheme[(macro)] namespace. It can be accessed using the @scheme[macro:] and @scheme[macro>] forms analogous to the @scheme[scat:] and @scheme[scat>] forms in the previous section. @specform[(macro: word ...)]{} @specform[(macro> word ...)]{} Coma's values are lists of @emph{stack machine} instructions, transformed by its operations. The transformations are always local to the end of the list, so the list of instructions behaves as a stack. Coma is thus a stack language serving as @emph{meta language} for another stack language. In its most basic form, the stack machine has two instructions. The @scheme[qw] (Quote Word) instruction loads a literal (number) from the instruction stream onto its parameter stack, while the @scheme[cw] (Call Word) instruction performs a procedure call to a named function. The Coma language performs @emph{partial evaluation} on the @scheme[qw] instructions, but leaves the abstract @scheme[cw] instructions alone. @ex[() (macro> 1 2) (macro> 1 2 +) (macro> 123 345 10 3 -) ] The Coma composition mechanism is the same as Scat's. The difference of the two languages lies in their primitives. When writing a compiler for a particular architecture, the task is to define the semantics of the primitives in terms of target code transformers. This can done using the form @scheme[patterns], a pattern matching mechanism for stack machine code. @defform/subs[(patterns namespace (lhs rhs) ...) ([lhs ([opcode operand ...] ... name)] [rhs ([opcode operand ...] ...) expr])] Each clause @scheme[(_lhs _rhs)] defines how the Coma macro @scheme[_name] transforms a particular pattern of instructions @scheme[([opcode operand ...] ...)] on the top of the code stack. For each unique name, the corresponding patterns are grouped and tried from top to bottom until there's a match. The @scheme[_opcode]s in @scheme[_lhs] are matched literally as symbols, while the @scheme[operand]s bind variables in the tagged lists representing instructions. The @scheme[_namespace] is most likely @scheme[(macro)]. If @scheme[_rhs] is a list of lists, the macro will replace the matched instructions with a list of template instructions with the @scheme[_opcode]s quoted literally and the @scheme[_operand]s evaluated. If @scheme[_rhs] is not a list of lists, it is a Scheme expression that evaluates to a macro which is subsequentially evaluated on the code stack with the matched instructions removed. For example, Let's redefine the Coma macro @scheme[+] to use the PIC18 instructions @scheme[addlw]. @ex[() (patterns (macro) (([qw a] [qw b] +) ([qw (+ a b)])) (([qw a] +) ([addlw a]))) (macro> 1 2 +) (macro> 123 +) ] On top of the @scheme[+] just defined, we define a macro @scheme[-] using macro delegation. The second pattern doesn't produce a macro but throws an exception. @ex[() (patterns (macro) (([qw a] -) (macro: ',(* -1 a) +)) ((-) (error 'not-implemented))) (macro> 10 3 -) (macro> -) ] Observe that @scheme[(macro> 1 2 +)] is now different than before. We used direct evaluation in the expression @scheme[(+ a b)]. However, it is best to use the @scheme[tscat:] form for this. @specform[(tscat: word ...)] This is like @scheme[scat:] in that the @scheme[_word]s come from the @scheme[scat] namespace, but it is different in two ways. If @scheme[_word] is present in the lexical context, it is interpolated as a quoted literal value. In the case above, the expression @scheme[(tscat: a b +)] is equivalent to @scheme[(scat: ',a ',b +)]. Additionally, the computation is wrapped to be evaluated at assembly time. This allows code addresses that are only available at assembly time to be used in partial evaluation. The previous example can now be adapted to: @ex[() (patterns (macro) (([qw a] [qw b] +) ([qw (tscat: a b +)])) (([qw a] +) ([addlw a]))) (macro> 1 2 +) (macro> 123 +) ] Beyond the first level of deconstruction that @scheme[patterns] performs, enforcing primitives to process lists of opcode . operands pairs locally as a stack, the pattern matching syntax is PLT Scheme's @scheme[match] from @scheme[scheme/match] and can be used to further deconstructs operands. This allows the processing of arbitrary data structures in intermediate code and is the substrate of compile time computation in Staapl. One creates primitive code generators and processors that can be used as components in low-level programs written in a Forth-like compositional language. @ex[() (define-struct vec (x y)) (patterns (macro) (([qw x] [qw y] vpack) ([qw (make-vec x y)])) (([qw (struct vec (x y))] vunpack) ([qw x] [qw y])) (([qw (struct vec (x y))] vswap) ([qw (make-vec y x)]))) (macro> 1 2 vpack) (macro> 1 2 vpack vswap vunpack) ] Opcodes are just symbolic tags and can be used inside a @scheme[patterns] form without the need for declaration. The only restriction is that stack machine code produced by a composition of macros can be processed by the stack machine assembler. This freedom allows the introduction of macros that serve as primitives for arbitrary code generation @emph{idioms}, by allowing arbitrary data to be passed on the code stack in the form of @emph{compile time data}, represented by the @scheme['qw] tag, or arbitrarily tagged @emph{pseudo instructions}. Instead of @scheme[[qw (struct vec (x y))]] we could have introduced the opcode @scheme['vec] using something like @scheme[[vec v]] or @scheme[[vec (struct vec (x y))]]. Doing so prevents manipulation of such objects by generic stack manipulation words like @scheme[dup] and @scheme[drop]. The general rule is: use @scheme['qw] for anything that rougly behaves as a target machine value, and use anything else to prevent the partial evaluator for @scheme['qw] tags to touch the instruction. For certain classes of operations one can abstract the possible patterns over the instances of the class. To this end the macro @scheme[patterns-class] can be used. @specform[(patterns namespace formals (actuals ...) clause ...)] The @scheme[_formals] parameter is a list containing identifiers that parameterize the @scheme[_clause]s, which are pattern clauses passed to the @scheme[patterns] form. The @scheme[_actuals] parameters are values to be filled in for the formal parameters. The following example defines pattern matching rules for unary operators on the Microchip PIC18 architecture. Note that the first clause implements peephole optimization: it combines a real machine instruction with the effect of a macro. @ex[() (patterns-class (macro) [word opcode] [(1+ incf) (1- decf) (rot<>c rrcf) (rot<< rlncf) (rot>> rrncf) (swap-nibble swapf)] (([movf f 0 0] word) ([opcode f 0 0])) ((word) ([opcode WREG 0 0]))) ] @section{Prefix Macros} FIXME: Factor out instantiation in Scheme, then write the prefix Forth syntax on top of that. Coma is a non-reflective meta language: it processes stack machine code only. In order to be able to abstract over @emph{names} of macros, a different mechanism is necessary. In Staapl this mechanism is called @emph{prefix macros}, and it is implemented as a preprocessing step defined in terms of a number of primitive @emph{special forms} based on Forth's @emph{parsing words}. The primitive prefix forms are @scheme[create : forth macro require load path]. @section{PIC18 Forth} The Microchip PIC18 microcontroller architecture with 8-bit data word size and 16-bit instruction word size maps quite straightforwardly to a stack machine, making it an ideal primary architecture for Staapl. The implementation in Staapl uses the WREG register to implement the top of stack register, and INDF0 to point to the rest of the parameter stack in SRAM. Most arithmetic and conditional jump instructions are used to implement basic forth data and control operations in a fairly optimal way. The compiler produces native code, eliminating the need for an on-target interpreter. Forth code can be used in interrupt routines without trouble. The PIC18 language includes a full RPN assembler. The PIC18F stores code in Flash ROM memory, but has the ability to program itself. This is used to provide run-time code upload in the interaction mode. @section{Interaction} Coma is a cross compiled language. In order not to loose the ability to interactively probe a running system, one of the virtues of Forth, this functionality is @emph{simulated} in the form of an interactive console requiring minimal on-target support.