1 Introduction
2 Exceptions
exn: fail: invalid-json?
exn: fail: invalid-json-location
3 Parse Fold
json-fold-lambda
make-json-fold
4 Conversion
json->sjson
json->sxml
write-json-as-xml
json->xml
5 History
6 Legal
Version: 0.2

json-parsing: JSON Parsing, Folding, and Conversion for Racket/Scheme

Neil Van Dyke

License: LGPL 3   Web: http://www.neilvandyke.org/racket-json-parsing/

 (require (planet neil/json-parsing:1:=1))

1 Introduction

The json-parsing package for Racket provides JSON parsing and format conversion using a streaming tree fold. This tree fold approach permits processing JSON input of arbitrary size in relatively small space for some applications, unlike the common approach of parsing the entire input to an AST before processing the AST.

The supported JSON format is as specified on http://json.org/, as viewed on 2010-12-25.

The format converters in package include a convertor to SJSON s-expression format. SJSON has been made to be fully compatible with the jsexpr of Dave Herman’s PLaneT package dherman/json:3:=0.

The parser does not consume any characters not belonging to the JSON value, and can be used to read multiple JSON values or to be intermixed with other kinds of reading from the same input.

The tree fold approach of this package’s parser was inspired and informed by Oleg Kiselyov’s SSAX XML parsing work.

Implementing the json-parsing package was originally intended as an exercise for getting more experience with SSAX-like folding, before undertaking some new XML packages, but the JSON work has turned out useful in its own right. A future version of this package might also implement alternative tree fold approaches.

2 Exceptions

When the parser encounters invalid JSON, it raises an exn:fail:invalid-json exception. While this exception will be caught by handlers such as exn:fail?, the distinct exception type permits JSON-parsing errors to be handled separately from other errors, and it also includes some location information.

exn:fail:invalid-json? : any/c

Type predicate.

(exn:fail:invalid-json-location exn:fail:invalid-json)  any/c
  exn:fail:invalid-json : any/c

Gets information on the location of the error within the input stream. Currently, this is a list of three elements, of the three values returned by Racket’s port-next-location procedure.

3 Parse Fold

(json-fold-lambda ...)

Special syntax that expands to a JSON parser procedure. Normally you would use this if you were defining a new application of what processing the parser should do while it is parsing JSON.

The resulting procedure of this syntax the arguments:

( in seed exhaust? )

where in is an input port or string, seed is a seed value, and exhaust? is whether or not to exhaustively consume all input and ensure that there is no other non-JSON-whitespace.

json-fold-lambda has many arguments, all of which must be present. Here is an example of how you might define a my-json-to-sjson procedure using json-fold-lambda:

  (define my-json-to-sjson
    (json-fold-lambda
     #:error-name         'my-json-to-sjson
     #:visit-object-start (lambda (seed)
                            (make-hasheq))
     #:visit-object-end   (lambda (seed parent-seed)
                            `(,seed ,@parent-seed))
     #:visit-member-start (lambda (name seed)
                            '())
     #:visit-member-end   (lambda (name seed parent-seed)
                            (hash-set! parent-seed
                                       (string->symbol name)
                                       (car seed))
                            parent-seed)
     #:visit-array-start  (lambda (seed)
                            '())
     #:visit-array-end    (lambda (seed parent-seed)
                            `(,(reverse seed) ,@parent-seed))
     #:visit-string       (lambda (str seed)
                            `(,str ,@seed))
     #:visit-number       (lambda (num seed)
                            `(,num ,@seed))
     #:visit-constant     (lambda (name seed)
                            `(,(case name
                                 ((true)  #t)
                                 ((false) #f)
                                 ((null)  #\nul)
                                 (else (error 'my-json-to-sjson
                                              "invalid constant ~S"
                                              name)))
                              ,@seed))))

As you can see, the arguments provide a set of procedures that are applied at various states in the parsing. Each of these callback procedures accepts at least one seed value from its preceding sibling and/or parent, and it produces a seed value for the next sibling, child, or parent.

The concepts object, member, and array are non-leaf notes in the tree. The start callback for each non-leaf node receives a seed from its preceding sibling, and the value it produces is the seed for its first child. The end callback receives both the seed from the last child, and the parent seed (the sibling predecessor seed of the node; the same seed received by the corresponding start).

The leaf nodes each simply receive a seed from the sibling predecessor callback (or, if the first sibling, from the parent start; or, if the first callback, from the seed provided to the parser call), and provide one to the sibling successor (or, if the last sibling, to the parent end; or, if the last callback, to the result of the parser call).

Note that two different techniques are used above to build collections of objects during processing, using seeds. The first is to use a hash that is passed in the seed, which in this case is used because SJSON requires a hash as part of its format. The second, and more common, is to construct lists by incrementally consing onto the front of the list, so that the list is ordred backwards, and waiting til the list is finished to put it in correct order using the reverse procedure.

The parser procedure returns either the value of the last callback, or, if the end of the input is reached without a JSON value, the eof object.

(make-json-fold ...)  any/c
  ... : any/c

This is like json-fold-lambda, except it is a procedure, rather than syntax. make-json-fold can be used in the less-common case that you need to define a new parser dynamically.

Note that, in the produced procedure, the exhaust? argument is optional (defaulting to #t). Thus, the signature is:

( in seed { #:exhaust? exhaust? }? )

4 Conversion

(json->sjson in #:exhaust? exhaust?)  any/c
  in : any/c
  exhaust? : any/c

Parse a JSON value from input port or string in, and return an SJSON parsed representation. SJSON is identical to the jsexpr defined by the PLaneT package dherman/json:3:=0.

(json->sxml in #:exhaust? exhaust?)  any/c
  in : any/c
  exhaust? : any/c

Parse the JSON input from input port or string in, and return in a contrived XML data format that can be processed with various SXML tools.

(write-json-as-xml in    
  #:exhaust? exhaust?    
  #:out out)  any/c
  in : any/c
  exhaust? : any/c
  out : any/c

Parse the JSON input from input port or string in, and write it in contrived XML data format to output port out (which defaults to the value of the current-output-port parameter). This is mainly a demonstration of “streaming” processing that can scale to arbitrary JSON input sizes.

(json->xml in #:exhaust? exhaust?)  any/c
  in : any/c
  exhaust? : any/c

This is like write-json-as-xml, but instead of writing to a port, it returns the XML as a string. Most people would not choose to do this.

5 History

6 Legal

Copyright (c) 2010 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.

Standard Documentation Format Note: The API signatures in this documentation are likely incorrect in some regards, such as indicating type any/c for things that are not, and not indicating when arguments are optional. This is due to a transitioning from the Texinfo documentation format to Scribble, which the author intends to finish someday.