F*dging up a Racket
Danny Yoo <dyoo@cs.wpi.edu>
1 Introduction
If people say that Racket is just a Scheme, they are short-selling Racket a little. It’s more accurate to say that Racket is a language laboratory, with support for many different languages.
#lang racket (define-syntax-rule (while test body ...) (let loop () (when test body ... (loop)))) ;; From this point forward, we've got while loops. (while (not (string=? (read-line) "quit")) (printf "never going to give you up\n") (printf "never going to let you down\n"))
#lang racket
We can understand the situation better by looking at another environment on our desktop, namely the web browser. A web browser supports different kinds of HTML variants, since HTML is a moving target, and browsers have come up with crazy rules for figuring out how to take an arbitrary document and decide what HTML parsing rules to apply to it.
HTML 5 tries to make this determination somewhat more straightforward: we can define an HTML 5 document by putting a DOCTYPE element at the very top of the file which self-describes the document as being html.
<!DOCTYPE html> |
<html lang="en"> |
<head><title>Hello world</title></head> |
<body><p>Hello world!</p></body> |
</html> |
Going back to the world of Racket, we see by analogy that the #lang line in a Racket program is a self-description of how to treat the rest of the program. (Actually, the #lang line is quite bit more active than this, but we’ll get to this in a moment.)
#lang datalog ancestor(A, B) :- parent(A, B). ancestor(A, B) :- parent(A, C), D = C, ancestor(D, B). parent(john, douglas). parent(bob, john). ancestor(A, B)?
#lang planet dyoo/bf ++++++[>++++++++++++<-]>. >++++++++++[>++++++++++<-]>+. +++++++..+++.>++++[>+++++++++++<-]>. <+++[>----<-]>.<<<<<+++[>+++++<-]>. >>.+++.------.--------.>>+.
Ignoring the question of why?!! someone would do this, let’s ask another: how do we build this? This tutorial will cover how to build this language into Racket from scratch.
Let’s get started!
2 The view from high orbit
#lang planet dyoo/bf ,[.,]
As mentioned earlier, a #lang line is quite active: it tells the Racket runtime how to convert from the surface syntax to an meaningful program. Programs in Racket get digested in a few stages; the process looks something like this:
reader macro expansion |
surface syntax ---------> AST -----------------> core forms |
When Racket sees #lang planet dyoo/bf, it will look for a particular module that we call a reader; a reader consumes surface syntax and excretes ASTs, and these ASTs are then annotated so that Racket knows how to make sense out of them later on. At this point, the rest of the Racket infrastructure kicks in and macro-expands the ASTs out, ultimately, to a core language.
Capture the meaning of brainf*ck by writing a semantics module.
Go from the line noise of the surface syntax into a more structured form by writing a parser module.
Connect the pieces, the semantics and the surface syntax parser, by making a reader module.
Profit!
3 Flight preparations
$ mkdir bf |
Ultimately, we want to put the fruit of our labor onto PLaneT, since that’ll make it easier for others to use our work. Let’s set up a PLaneT development link so the Racket environment knows about our work directory. I already have an account on PLaneT with my username dyoo. You can get an account fairly easily.
$ planet link dyoo bf.plt 1 0 bf |
$ cd bf |
~/bf$ cat >hello.rkt |
#lang racket |
"hello world" |
~/bf$ racket |
Welcome to Racket v5.1.1. |
> (require (planet dyoo/bf/hello)) |
"hello world" |
> |
4 The brainf*ck language
a byte array of data, and
a pointer into that data array
Increment the data pointer (>)
Decrement the data pointer (<)
Increment the byte at the data pointer (+)
Decrement the byte at the data pointer (-)
Write a byte to standard output (.)
Read a byte from standard input (,)
Perform a loop until the byte at the data pointer is zero ([, ])
"semantics.rkt"
#lang racket (require rackunit) ;; for unit testing (provide (all-defined-out)) ;; Our state contains two pieces. (define-struct state (data ptr) #:mutable) ;; Creates a new state, with a byte array of 30000 zeros, and ;; the pointer at index 0. (define (new-state) (make-state (make-vector 30000 0) 0)) ;; increment the data pointer (define (increment-ptr a-state) (set-state-ptr! a-state (add1 (state-ptr a-state)))) ;; decrement the data pointer (define (decrement-ptr a-state) (set-state-ptr! a-state (sub1 (state-ptr a-state)))) ;; increment the byte at the data pointer (define (increment-byte a-state) (let ([v (state-data a-state)] [i (state-ptr a-state)]) (vector-set! v i (add1 (vector-ref v i))))) ;; decrement the byte at the data pointer (define (decrement-byte a-state) (let ([v (state-data a-state)] [i (state-ptr a-state)]) (vector-set! v i (sub1 (vector-ref v i))))) ;; print the byte at the data pointer (define (write-byte-to-stdout a-state) (let ([v (state-data a-state)] [i (state-ptr a-state)]) (write-byte (vector-ref v i) (current-output-port)))) ;; read a byte from stdin into the data pointer (define (read-byte-from-stdin a-state) (let ([v (state-data a-state)] [i (state-ptr a-state)]) (vector-set! v i (read-byte (current-input-port))))) ;; we know how to do loops! (define-syntax-rule (loop a-state body ...) (let loop () (unless (= (vector-ref (state-data a-state) (state-ptr a-state)) 0) body ... (loop))))
"semantics.rkt"
;; Simple exercises. (let ([s (new-state)]) (increment-byte s) (check-equal? 1 (vector-ref (state-data s) 0)) (increment-byte s) (check-equal? 2 (vector-ref (state-data s) 0)) (decrement-byte s) (check-equal? 1 (vector-ref (state-data s) 0))) ;; pointer movement (let ([s (new-state)]) (increment-ptr s) (increment-byte s) (check-equal? 0 (vector-ref (state-data s) 0)) (check-equal? 1 (vector-ref (state-data s) 1)) (decrement-ptr s) (increment-byte s) (check-equal? 1 (vector-ref (state-data s) 0)) (check-equal? 1 (vector-ref (state-data s) 1))) ;; make sure standard input is doing something (let ([s (new-state)]) (parameterize ([current-input-port (open-input-bytes (bytes 3 1 4))]) (read-byte-from-stdin s) (increment-ptr s) (read-byte-from-stdin s) (increment-ptr s) (read-byte-from-stdin s)) (check-equal? 3 (vector-ref (state-data s) 0)) (check-equal? 1 (vector-ref (state-data s) 1)) (check-equal? 4 (vector-ref (state-data s) 2))) ;; make sure standard output is doing something (let ([s (new-state)]) (set-state-data! s (vector 80 76 84)) (let ([simulated-stdout (open-output-string)]) (parameterize ([current-output-port simulated-stdout]) (write-byte-to-stdout s) (increment-ptr s) (write-byte-to-stdout s) (increment-ptr s) (write-byte-to-stdout s)) (check-equal? "PLT" (get-output-string simulated-stdout)))) ;; Let's see that we can clear. (let ([s (new-state)]) (set-state-data! s (vector 0 29 92 14 243 1 6 92)) (set-state-ptr! s 7) ;; [ [-] < ] (loop s (loop s (decrement-byte s)) (decrement-ptr s)) (check-equal? 0 (state-ptr s)) (check-equal? (vector 0 0 0 0 0 0 0 0) (state-data s)))
Good! Our tests, at the very least, let us know that our definitions are doing something reasonable, and they should all pass.
However, there are a few things that we may want to fix in the future, like the lack of error trapping if the input stream contains eof. And there’s no bounds-checking on the ptr or on the values in the data. Wow, there are quite a few things that we might want to fix. But at the very least, we now have a module that captures the semantics of brainf*ck.
5 Lisping a language
"language.rkt"
#lang racket (require "semantics.rkt") (provide greater-than less-than plus minus period comma brackets (rename-out [my-module-begin #%module-begin])) ;; The current-state is a parameter used by the ;; rest of this language. (define current-state (make-parameter (new-state))) ;; Every module in this language will make sure that it ;; uses a fresh state. (define-syntax-rule (my-module-begin body ...) (#%plain-module-begin (parameterize ([current-state (new-state)]) body ...))) (define-syntax-rule (greater-than) (increment-ptr (current-state))) (define-syntax-rule (less-than) (decrement-ptr (current-state))) (define-syntax-rule (plus) (increment-byte (current-state))) (define-syntax-rule (minus) (decrement-byte (current-state))) (define-syntax-rule (period) (write-byte-to-stdout (current-state))) (define-syntax-rule (comma) (read-byte-from-stdin (current-state))) (define-syntax-rule (brackets body ...) (loop (current-state) body ...))
This "language.rkt" presents brainf*ck as a s-expression-based language. It uses the semantics we’ve coded up, and defines rules for handling greater-than, less-than, etc... We have a parameter called current-state that holds the state of the brainf*ck machine that’s used through the language.
> (syntax->datum (expand '(module an-example-module '#%kernel "hello" "world"))) '(module an-example-module '#%kernel (#%module-begin '"hello" '"world"))
#lang s-exp (planet dyoo/bf/language) (plus)(plus)(plus)(plus)(plus) (plus)(plus)(plus)(plus)(plus) (brackets (greater-than) (plus)(plus)(plus)(plus)(plus) (plus)(plus) (greater-than) (plus)(plus)(plus)(plus)(plus) (plus)(plus) (plus)(plus)(plus) (greater-than) (plus)(plus)(plus) (greater-than) (plus) (less-than)(less-than)(less-than) (less-than) (minus)) (greater-than) (plus)(plus) (period) (greater-than) (plus) (period) (plus)(plus)(plus)(plus)(plus) (plus)(plus) (period) (period) (plus)(plus)(plus) (period) (greater-than) (plus)(plus) (period) (less-than)(less-than) (plus)(plus)(plus)(plus)(plus) (plus)(plus)(plus)(plus)(plus) (plus)(plus)(plus)(plus)(plus) (period) (greater-than) (period) (plus)(plus)(plus) (period) (minus)(minus)(minus)(minus)(minus)(minus)(period) (minus)(minus)(minus)(minus)(minus)(minus)(minus)(minus) (period)(greater-than) (plus) (period) (greater-than) (period)
The #lang line here is saying, essentially, that the following program is written with s-expressions, and should be treated with the module language "language.rkt" that we just wrote up. And if we run this program, we should see a familiar greeting. Hurrah!
... But wait! We can’t just declare victory here. We really do want to allow the throngs of brainfu*ck programmers to write brainf*ck in the surface syntax that they deserve. Keep "language.rkt" on hand, though. We will reuse it by having our parser transform the surface syntax into the forms we defined in "language.rkt".
Let’s get that parser working!
6 Parsing the surface syntax
The Racket toolchain includes a professional-strength lexer and parser in the parser-tools collection. For the sake of keeping this example terse, we’ll write a simple recursive-descent parser without using the parser-tools collection. (But if our surface syntax were any more complicated, we might reconsider this decision.)
The expected output of a successful parse should be some kind of abstract syntax tree. What representation should we use for the tree? Although we can use s-expressions, they’re pretty lossy: they don’t record where they came from in the original source text. For the case of brainf*ck, we might not care, but if we were to write a parser for a more professional, sophisticated language (like LOLCODE) we want source locations so we can give good error messages during parsing or run-time.
As an alternative to plain s-expressions, we’ll use a data structure built into Racket called a syntax object; syntax objects let us represent ASTs, just like s-expressions, and they also carry along auxiliary information, such as source locations. Plus, as we briefly saw in our play with expand, syntax objects are the native data structure that Racket itself uses during macro expansion, so we might as well use them ourselves.
> (define an-example-syntax-object (datum->syntax #f 'hello (list "hello.rkt" 1 20 32 5)))
> an-example-syntax-object #<syntax:1:20 hello>
> (syntax? an-example-syntax-object) #t
> (syntax->datum an-example-syntax-object) 'hello
> (symbol? (syntax->datum an-example-syntax-object)) #t
> (syntax-source an-example-syntax-object) "hello.rkt"
> (syntax-line an-example-syntax-object) 1
> (syntax-column an-example-syntax-object) 20
> (syntax-position an-example-syntax-object) 32
> (syntax-span an-example-syntax-object) 5
"parser.rkt"
#lang racket ;; The only visible export of this module will be parse-expr. (provide parse-expr) ;; While loops... (define-syntax-rule (while test body ...) (let loop () (when test body ... (loop)))) ;; ignorable-next-char?: input-port -> boolean ;; Produces true if the next character is something we should ignore. (define (ignorable-next-char? in) (let ([next-ch (peek-char in)]) (cond [(eof-object? next-ch) #f] [else (not (member next-ch '(#\< #\> #\+ #\- #\, #\. #\[ #\])))]))) ;; parse-expr: any input-port -> (U syntax eof) ;; Either produces a syntax object or the eof object. (define (parse-expr source-name in) (while (ignorable-next-char? in) (read-char in)) (let*-values ([(line column position) (port-next-location in)] [(next-char) (read-char in)]) ;; We'll use this function to generate the syntax objects by ;; default. ;; The only category this doesn't cover are brackets. (define (default-make-syntax type) (datum->syntax #f (list type) (list source-name line column position 1))) (cond [(eof-object? next-char) eof] [else (case next-char [(#\<) (default-make-syntax 'less-than)] [(#\>) (default-make-syntax 'greater-than)] [(#\+) (default-make-syntax 'plus)] [(#\-) (default-make-syntax 'minus)] [(#\,) (default-make-syntax 'comma)] [(#\.) (default-make-syntax 'period)] [(#\[) ;; The slightly messy case is bracket. We keep reading ;; a list of exprs, and then construct a wrapping bracket ;; around the whole thing. (let*-values ([(elements) (parse-exprs source-name in)] [(following-line following-column following-position) (port-next-location in)]) (datum->syntax #f `(brackets ,@elements) (list source-name line column position (- following-position position))))] [(#\]) eof])]))) ;; parse-exprs: input-port -> (listof syntax) ;; Parse a list of expressions. (define (parse-exprs source-name in) (let ([next-expr (parse-expr source-name in)]) (cond [(eof-object? next-expr) empty] [else (cons next-expr (parse-exprs source-name in))])))
> (define my-sample-input-port (open-input-string ",[.,]"))
> (define first-stx (parse-expr "my-sample-program.rkt" my-sample-input-port)) > first-stx #<syntax::1 (comma)>
> (define second-stx (parse-expr "my-sample-program.rkt" my-sample-input-port)) > second-stx #<syntax::2 (brackets (period) (comma))>
> (parse-expr "my-sample-program.rkt" my-sample-input-port) #<eof>
> (syntax->datum second-stx) '(brackets (period) (comma))
> (syntax-source second-stx) "my-sample-program.rkt"
> (syntax-position second-stx) 2
> (syntax-span second-stx) 4
We mentioned that the parser wasn’t too hard... but then again, we haven’t written good traps for error conditions. This parser is a baby parser. If we were more rigorous, we’d probably implement it with the parser-tools collection, write unit tests for the parser with rackunit, and make sure to produce good error messages when Bad Things happen (like having unbalanced brackets or parentheses.
Still, we’ve now got the language and a parser. How do we tie them together?
7 Crossing the wires
A parser in "parser.rkt" for the surface syntax that produces ASTs
A module language in "language.rkt" that provides the meaning for those ASTs.
#lang planet dyoo/bf
"lang/reader.rkt"
#lang s-exp syntax/module-reader (planet dyoo/bf/language) #:read my-read #:read-syntax my-read-syntax (require "../parser.rkt") (define (my-read in) (syntax->datum (my-read-syntax #f in))) (define (my-read-syntax src in) (parse-expr src in))
$ cat hello2.rkt |
#lang planet dyoo/bf |
++++++[>++++++++++++<-]>. |
>++++++++++[>++++++++++<-]>+. |
+++++++..+++.>++++[>+++++++++++<-]>. |
<+++[>----<-]>.<<<<<+++[>+++++<-]>. |
>>.+++.------.--------.>>+. |
|
$ racket hello2.rkt |
Hello, World! |
Sweet, sweet words.
8 Landing on PLaneT
Finally, we want to get this work onto PLaneT so that other people can share in the joy of writing brainf*ck in Racket. Let’s do it!
First, let’s go back to the parent of our work directory. Once we’re there, we’ll use the planet create command.
$ planet create bf |
planet create bf |
MzTarring ./... |
MzTarring ./lang... |
|
WARNING: |
Package has no info.rkt file. This means it will not have a description or documentation on the PLaneT web site. |
|
$ ls -l bf.plt |
-rw-rw-r-- 1 dyoo nogroup 3358 Jun 12 19:39 bf.plt |
"info.rkt"
#lang setup/infotab (define name "bf: a brainf*ck compiler for Racket") (define categories '(devtools)) (define can-be-loaded-with 'all) (define required-core-version "5.1.1") (define version "1.0") (define repositories '("4.x")) (define scribblings '()) (define primary-file "language.rkt") (define blurb '("Provides support for the brainf*ck language.")) (define release-notes '((p "First release")))
$ planet unlink dyoo bf.plt 1 0 |
$ racket hello2.rkt |
require: PLaneT could not find the requested package: Server had no matching package: No package matched the specified criteria |
$ planet fileinject dyoo bf.plt 1 0 |
planet fileinject dyoo bf.plt 1 0 |
|
============= Installing bf.plt on Sun, 12 Jun 2011 19:49:50 ============= |
raco setup: Unpacking archive from /home/dyoo/bf.plt |
raco setup: unpacking README in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: unpacking hello.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: unpacking hello2.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: making directory lang in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: unpacking reader.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./lang/ |
raco setup: unpacking language.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: unpacking parser.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: unpacking semantics.rkt in /home/dyoo/.racket/planet/300/5.1.1/cache/dyoo/bf.plt/1/0/./ |
raco setup: version: 5.1.1 [3m] |
... |
$ racket hello2.rkt |
Hello, World! |
Once we’re finally satisfied with the package’s contents, we can finally upload it onto PLaneT. If you log onto planet.racket-lang.org, the user interface will allow you to upload your "bf.plt" package.
9 Acknowledgements
Thanks to Shriram Krishnamurthi for being understanding
when I told him I had coded a brainf*ck compiler. Shoutouts to the PLT group at
Brown University —