[Ed. Note.: Much of this refers to the Zodiac version of the stepper, which I am busily replacing at this very moment. jbc, 2001-12-04] variable references: there are three kinds of variable references: 1) bound variable refs 2) unit-bound variable refs 3) top-level variable refs You might be forgiven for some confusion: these three appear to overlap heavily. Here are more accurate defintions for each one: unit-bound variable references are those which occur as the left-hand sides of top-level definitions within a unit. bound variable references are those which occur within the scope of a lambda, case-lambda, let, let*, letrec, or other form which introduces a limited lexical scope. This includes `local', but not the unit-bound variables mentioned above. top-level references are the rest of the references. One difference between top-level and bound varrefs are the way that they are handled at runtime. Top-level varrefs are looked up in a table; if they are not found in this table, a runtime error is signalled. Note that this lookup occurs only when the varref is evaluated, not when it is first `encountered' (e.g., in the body of a closure). One reason that this mechanism is necessary is that a Scheme REPL permits top-level references to variables that have not yet been defined. Bound varrefs have a known lexical binding location, and they can be looked up directly, rather than going through the indirection of checking a table. These variables may be introduced by forms like `letrec' or `local', and they may furthermore be used before their binding definition has been evaluated. In this case, they have the `' value. In most language levels, a reference to a variable which contains the `' value is an error. In such a language level, any variable which may have this value must be checked on every evaluated reference. So here's the problem: unit-bound varrefs are similar to those inside a `local'. Syntactically, their bindings are introduced by `define', and their scope extends in both directions. Semantically they are similar to bound variables, in that the interpreter can lexically fix the binding of the variable. In both of these regards they are similar to the bindings in a `local'. However, zodiac does not parse them like those in a `local'. Rather, it parses them as `top-level-varref's. Why? I forget, and I'm about to ask Matthew yet again. Then I'll record the answer here. Now things get a bit more complicated. Top-level varrefs never need to be checked for the '' value; before they are bound, they have no runtime lookup location at all. Bound varrefs and unit varrefs, on the other hand, may contain the `' value. In particular, those bound by letrec, local, and units may contain this value. Others, like those bound by lambda, let, and let*, will not. For the first and third categories, we do not need to check for the undefined value at runtime. Only when we are looking at a bound or unit varref which may contain the `' value do we need to insert a runtime check. ******* Another topic entirely is that of sharing. When a break occurs, the stepper reconstructs the state of memory. However, two closures may refer to the same binding. For instance, (define-values (setter getter) (let ([a '*undefined*]) (values (lambda (x) (set! a x)) (lambda () a)))) If each closure is linked to a record of the form (lambda () values-of-free-vars), there's no way to tell whether the first and second closure refer to the same binding of a or not. So in this case, we must devise some other technique to detect sharing. A simple one suggested by Matthew is to store mutators in the closure record; then, sharing can be detected by the old bang-one-and-see-if-the-other-changes technique. ********* A note about source locations: I'm using the "start" locations of sexps (assigned by Zodiac) to uniquely identify those expressions: I don't believe there are any instances where two expressions share a start location. Later: this is now obsolete: I'm just storing the parsed zodiac expressions. Forget all of this source correlation crap. Zodiac does it for me. [Ed. Note: this observation turned out to be completely wrong. cf. later notes.] ********* Robby has a good point: Matthew's technique for detecting gaps in the continuation-mark chain (look for applications whose arguments are fully evaluated but are still on the list of current marks) depends on the assumption that every "jump site" has the jump as its tail action. In other words, what about things like "invoke-unit/open", which jumps to some code, evaluates it, >then comes back and binds unit values in the environment<. In this case, the "invoke-unit/open" continuation will not be handed directly to the evaluation of the unit, because work remains to be done after the evaluation of the unit's definitions. Therefore, it will be impossible to tell when un-annotated code is appearing on the stack in uses of "invoke-unit/open." Problem. ********* So what the heck does a mark contain for the stepper? it looks like this: (lambda () (list )) with var-list = (list-of var) and var = (list z:varref) ********* Let me say a few words here about the overall structure of the annotator/stepper combination. We have a choice when rebuilding the source: we can follow the source itself, or we can follow the parsed expression emitted by zodiac. If our task is simply to spit out source code, then it's clear that we should simply follow the source. However, we need to replace certain variables with the values of their bindings (in particular, lambda-bound ones). Well, in beginner mode anyway... ******* Okay, I'm about to extend the stepper significantly, and I want to do at least a little bit of design work first. The concept is this: I want the stepper to stop _after_ each reduction, as well as before it. One principal difference between the new and old step types is that in the new one, the continuation cannot be rectified entirely based upon the continuation marks; the value that is produced by the expression in question is also needed. Here's a question: can I prove, for the setup I put together, that the part of the continuation _outside_ the highlighted region does not change? This should be the case; after all, the continuation itself does not change. Of course, there are some reductions which do not immediately produce a value; procedure applications, and ... uh oh, what about cond and if expressions? We want the stepper to use the appropriate "answer" as the "result" of the step. So there's some context sensitivity here. Wait, maybe not. It seems like _every_ expression will have to have a "stop on entry" step. Further, these types of steps will _not_ have values associated with them. Hmmm.... Okay, this isn't that hard. Yes, it's true that every expression that becomes ... no, it's not obvious that the expression which is substituted ... jesus, it's not even always the case that a "substitution" occurs in the simplistic sense I'm imagining. Damn, I wish my reduction semantics were finished. (Much later): The real issue is that the "stop-on-enter" code is inserted based on the surrounding code, and So, here's the next macro we need to handle: define-struct. ********* Don't forget a test like (cond [blah] [else cond [blah] [blah]]) ********** Okay, I'm a complete moron. In particular, I threw out all of the source correlation code a week ago because I somehow convinced myself that the parsed expressions retained references to the read expressions. That's not true; all that's kept is a "location" structure, which records the file and offset and all that jazz. So I tried to fix that by inserting these source expressions into the marks, along with the parsed expressions. This doesn't work because I need to find the read expressions for expressions that don't get marks... or do I? Yes, I do. In particular, to unparse (define a 3), I need to see the read expression to know that it wasn't really (define-values (a) (values 3)). Maybe I can add a field to zodiac structures a la maybe-undefined? ************ That worked great! ************ Man, there's a lot of shared code in here. ************ Okay, back to the drawing board on a lot of things. 1) Matthias and Robby are of the opinion that the break for an expression should be triggered only when that expression becomes the redex. For example, the breakpoint for an if expression is triggered _after_ the test expression is evaluated. 2) I've realized that I need a more general approach in the annotater to handle binding constructs other than lambda. In particular, the new scheme handles top-level variables differently than lexically bound ones. In particular, the mark for an expression contains the value of a top-level variable if (1) the variable occurs free in the expression, and (2) the expression is on the spine of the current procedure or definition. Lexically bound variables are placed in the mark if (1) they occur free in the expression, and (2) they are in tail position relative to the innermost binding expression for the variable. *** Wait, no. This is crap, because the bodies of lambdas need to store all free variables, regardless of whether they're lexically tail w.r.t. the binding occurrence. Maybe it really would just be easier to do this in two passes. How would this work? One pass would attach the free variables to each expression. Then, the variables you must store in the mark for an expression are those which (1) occur free and (2) are not contained in some lexically enclosing expression. I guess we can use the register-client ability of zodiac for this... We're helped out in the lexical variables by the fact that zodiac renames all lexically bound variables, so no two bindings have the same name. Of course, that's not the case for the special variables inserted by the annotator. Most of these ... well, no, all of these will have to appear in marks now. The question is whether they'll ever fight with each other. In the case of applications, I'm okay, because the only expressions which appear in tail ... wait, wait, the only problem that I could have here arises when top-level variables have the same names as lexically bound ones, and since all of the special ones are lexically bound, this is fine. ************ I'm taking these comments out of the program file. They just clutter things up. ; make-debug-info takes a list of variables and an expression and ; creates a thunk closed over the expression and (if bindings-needed is true) ; the following information for each variable in kept-vars: ; 1) the name of the variable (could actually be inferred) ; 2) the value of the variable ; 3) a mutator for the variable, if it appears in mutated-vars. ; (The reason for the third of these is actually that it can be used ; in the stepper to determine which bindings refer to the same location, ; as per Matthew's suggestion.) ; ; as an optimization: ; note that the mutators are needed only for the bindings which appear in ; closures; no location ambiguity can occur in the 'currently-live' bindings, ; since at most one location can exist for any given stack binding. That is, ; using the source, I can tell whether variables referenced directly in the ; continuation chain refer to the same location. ; okay, things have changed a bit. For this iteration, I'm simply not going to ; store mutators. later, I'll add them in. ************ Okay, I'm back to the one-pass scheme, and here's how it's going to work. Top-level variables are handled differently from lexically bound ones. Annotate/inner takes an expression to annotate, and a list of variables whose bindings the current expression is in tail position to. This list may optionally also hold the symbol 'all, which indicates that all variables which occur free should be placed in the mark. *********** Regarding the question: what the heck is this lexically-bound-vars argument to annotate-source-expr? The answer is that if we're displaying a lambda, we do not have values for the variables whose bindings are the arguments to the lambda. For instance, suppose we have: (define my-top-level 13) (define my-closure (lambda (x) (x top-level))) When we're displaying my-closure, we better not try to find a value for x when reconstructing the body, as there isn't one. ************* This may come back to haunt me: the temporary variables I'm introducing for applications and 'if's are funny: they have no bindings. They have no orig-name's. They _must_ be expanded, always. This may be a problem when I stop displaying the values of lambda-bound variables. *************** currently on the stack: yank all of that 'comes-from-blah' crap if read->raw works. ************* annotater philosophy: don't look at the source; just expand based on the parsed expression. The information you need to reconstruct the ************* for savings, I could elide the guard-marks on all but the top level. *********** months later; October 99. major reorganization, along a model-view-controller philosophy. Here's how it works: The view and controller (for the regular stepper) are combined in a gui unit. This unit takes a text%, handles all gui stuff, and invokes the model unit (one for each stepping). The model unit is a compound unit. It consists of the annotater, the reconstructor, and the model unit itself. Gee whiz; there's so much stuff I haven't talked about. Like for instance the fact that the stepper now has before and after steps. The point of this reorganization is to permit a natural test suite. Jesus, that's been a long time coming. At some point, I'm also hoping to combine the stepper into the main DrScheme frame. Oh yes, another major change was that evaluation is now strictly on a one- expression-at-a-time basis. The read, parse, and step are now done indiv- idually for each expression. This has the ancillary benefit that there's no longer any need to reconstruct _all_ of the old expressions at every step. ************ You know, I should never have started that ******** divider. I have no idea how many stars are supposed to be there. Oh well. ************ The version for DrS-101 is out, and I've restructured the stepper into a "model/view/controller" architecture, primarily to ease testing. Of course, I haven't actually written the tester yet. So now, the view and controller are combined in stepper-view-controller.ss, and the model (instantiated once per step-process) is in stepper-model.ss. In fact, the view-controller is also instantiated once per step-process, so I'm not utilizing the division in that way, but the tester will definitely want to instantiate the model repeatedly. *********** I also want to comment a little bit on some severe ugliness regarding pretty- printing. The real problem is how to use the existing pretty-print code, while still having enough control to highlight in the right locations. Okay, let me explain this one step at a time. The way the pretty-printer currently works is this: there are four hooks into the pretty-printing process. The first one is used to determine the width of an element. The result of this procedure is used to decide whether a line break is necessary. However, this hook is _also_ used to determine whether or not the pretty-printer will try to print the string itself or hand off responsibility to the display-handler hook. In other words, if the width- hook procedure returns a non-false value, then the display-handler will be called to print the actual string. The other pair of hook procedures is first, a procedure which is called _before_ display of any subexpression, and one which is called _after_ display of any subexpression. So how does the stepper use this to do its work? Well, the stepper has two tricky tasks to accomplish. First, it must highlight some subexpression. Second, it must manually insert elements (i.e. images) which the pretty-printer does not handle. Let's talk about images first. In order to display images, the width-hook procedure detects images and (if one is encountered) returns a width explicitly. (Currently that width is always one, which can lead to display errors, but let's leave that for later.) Remember, whenever the width returned by this hook is non-false, the display handler will be called to insert the object. That's perfect: the display hander inserts the image just fine. One down, one to go. The stepper needs to detect the beginning of the (let's call it the) redex. The obvious way to do this is (almost) the right way: the before-printing handler checks to see whether the element about to be printed is the redex (by an eq?-test). If so, it sets the beginning of the highlight region. A corresponding test determines the end of the highlight region. When the pretty-printing is complete, we highlight the desired region. Fine. BUT, sometimes we want to highlight things like numbers and symbols; in other words, non-heap values. For instance, suppose I tell you that the expression that we're printing is (if #t #t #t) and that you're supposed to be highlight- ing the #t. Well, I can't tell which of the #t's you want to highlight. So this isn't enough information. To solve this problem, the result of the reconstructor is split up into two pieces: the reconstructed stuff outside the box, with a special gensym occurring where the redex should be, and a separate expression containing the redex. Now at least the displayer has enough information to do its job. Now, what happens is that when the width-hook runs into the special gensym, it knows that it must insert the redex. Well, that's fine, but remember, if this procedure wants to take control of the printing process, it must do so by returning the width of the printed object, and then this object must be printed by the display-hook. The problem here is that neither of these procedures have the faintest idea about line-breaks; that's the pretty- printer's job. In other words, this solution only works for things (like numbers, symbols and booleans) which cannot be split across lines. What do we do? Well, the solution is ugly. Remember, the only reason we had to resort to this baroque solution in the first place is that values like numbers, symbols, and booleans couldn't be identified uniquely by eq?. So we take a two- pronged approach. For non-confusable values, we insert them in place of the gensym before doing the printing. For confusable values, we leave the placeholder in and take control of the printing process manually. In other words, the _only_ reason this solution works is because of the chance overlap between confusable values and non-breakable values. To be more precise, it just so happens that all confusable values are non- line-breakable. Lucky. And Ugly. ***** January, 2000 I'm working on the debugger, now, and in particular extending the annotater to handle all of the Zodiac forms. Let and Letrec turn out to be quite ugly. I'm still a little unsure about certain aspects of variable references, like for example whether or not they stay renamed, or whether they return to their original names. [ed. note: they get new uninterned symbols that print like their original names] But that's not what I'm here to talk about. No, the topic of the day is 'floating variables.' A floating variable is one whose value must be captured in a continuation mark even though it doesn't occur free in the expression that the wcm wraps. Let me give an example: (unit/sig some-sig^ (import) (define a 13) (define b (wcm (+ 3 4)))) In this case, the continuation-mark must hold the value of a, even though a does not occur free in the rhs of b's definition. Floating variables are stored in a parameter of annotate/inner. In other words, they propagate downward. Furthermore, they're subject to the same potential elision as all other variables; you only need to store the ones which are also contained in the set tail-bound. Also note that (thank God) Zodiac standardizes names apart, so we don't need to worry about duplications. Also note that floating variables may only be bound-varrefs. ******** Okay, well that doesn't work at all; dynamic scope blows it away completely. For instance, imagine the following unit: (unit/sig some-sig^ (import sig-that-includes-c^) (define a 13) (define b (c))) Now, during the execution of c, there's no mark on the stack which holds the bindings of a. DUH! I can't believe I didn't think of this before. Okay, one possible solution for this would be to use _different keys_ for the marks, so that a mark on the unit-evaluation-continuaiton could be retained. ********* Okay, time to do units. Compound units are dead easy. Just wrap them in a wcm that captures all free vars. No problemo. Normal units are more tricky, because of their scoping rules. Here's my canonical translation: (unit (import vars) (export vars) (define a a-exp) b (define c c-exp) d etc.) ... goes to ... (unit (import vars) (export vars) (wcm blah ; including imported vars (begin (set! a a-exp) b (set! c c-exp) d)) (define a a) (define c c) ...) ************ Well, I still haven't written the code to annotate units, so it's a damn good thing I wrote down the transformation. I'm here today (thank you very much) to talk about annotation schemes. I just (okay, a month ago --- it's now 2000-05-23) folded aries into the stepper. the upshot of this is that aries now supports two different annotation modes: "cheap-wrap," which is what aries used to do, and the regular annotation, used for the algebraic stepper. However, I'm beginning to see a need for a third annotation, to be used for (non- algebraic) debugging. In particular, much of the bulk involved in annotating the program source is due to the strict algebraic nature of the stepper. For instance, I'm now annotating lets. The actual step taken by the let is after the evaluation of all bindings. So we need a break there. However, the body expression is _also_ going to have a mark and a break around it, for the "result-break" of the let. I thought I could leave out the outer break, but it doesn't work. Actually, maybe I could leave out the inner one. Gee whiz. This stuff is really complicated. ************* Okay, well, I figured all that stuff out, but now I've got to restructure the reconstructor to handle lifting---PRE-lifting, that is---on let/letrec/local. In particular, the reconstruct-inner function will now return four things: the free bindings, the reconstructed expr, the "before" definitions, and the "after" definitions. These before and after definitions are wrapped around the current set of generated definitions. Case in point; I'm about to execute the (+ 7 8) in the following expression: (let ([a 4] [b (let ([h 3] [i (+ 7 8)] [j 9]) (+ h i j))] [c 19]) (+ a b c)) How do we reconstruct this? Well, first we reconstruct the (+ 7 8) itself, that's easy. Then, we encounter a let. The return value of this will be the _before_ expressions: (define ~h~0 3) the _after_expressions: (define ~i~0 (+ 7 8)) (define ~j~0 9) and the reconstructed expression: (+ ~h~0 ~i~0 ~j~0) Now, we recur, using the reconstructed expression. The next step outward is _also_ a let, so we get the following before expressions: (define ~a~0 4) the following after expressions: (define ~b~0 (+ ~h~0 ~i~0 ~j~0)) <---here is where the reconstructed expr appears (define ~c~0 19) and the reconstructed expression: (+ ~a~0 ~b~0 ~c~0) So then, the final assembly occurs when the "before" expressions are slapped together, last first, then the "after" expressions, first first, and then whatever reconstructed expression is left over. Ugh. *********** Wow. more complications. Here's the new problem. Let's say I have an expression like this: (define (make-thunk) (let ([lexical-binding 14] [returned-thunk (lambda () lexical-binding)]) returned-thunk)) (define first-thunk (make-thunk)) (define second-thunk (make-thunk)) (first-thunk) Now, when I'm just inside the body of first-thunk, and trying to reconstruct "lexical- binding", I need to know what lifted name it got. There are a bunch of ways to try to do this, but I'm going to take the most straightforward approach (which came to me after about a day of thought), which is to expand every lexical binding into a pair of bindings; one which refers to the bound value (with the same name as the original binding), and a new, gensym'ed one, which indicates what index number this binding has received. 2000-06-05 *********** So here's the new format of a full mark: (make-mark label source bindings) where label is a symbol, source is a zodiac:parsed, and bindings is an association list from bindings to values. Note, however, that every let-type binding now has _two_ entries in this list. The first one supplies the binding's value, and the second one supplies the lifted name's index. [ed note.: see note for 2000-09-26] 2000-06-06 *********** How do we guarantee that lifted names do not clash? Well, for each binding we use the original name, with two numbers appended to it, separated by zeros; the first one indicates which binding it is (more than one binding may have the same original name), and the second one indicates which dynamic occurrence of this binding it is. So, for instance, if a program contains one binding named 'foo', and it's evaluated three times, the third evaluation would result in the lifted name 'foo0002'. I personally guarantee that no namespace clashes can occur in this scheme. Yep. 2000-06-06 *********** Oh.. Well, Matthias prefers a naming scheme whereby all bindings are assigned sequential numbers, regardless of the binding name. So this name clash isn't really an issue anymore. 2000-09-09 *********** To handle units, marks must now contain "top-level" (actually, unit-bound) variables. For this reason, the datatype for a full mark must change. a full mark is now: (make-full-mark location label bindings) where location is a zodiac:location label is a symbol, and bindings is an association list containing and values a is either a zodiac:binding (for bound vars), or a slot (for unit-bound vars in the zodiac:top-level-varref/bind/unit struct). *********** Ooookay. We're in Boston now, and I'm rewriting the stepper completely to work with version 200. In other words, we're scrapping Zodiac completely. This is an interesting SE task, because from a data-driven design standpoint, the code is starting from zero again; all of my data have different shapes now. Another change is that with the demise of DrScheme Jr and the institution of the static-compilation module mechanism, there's no longer a need for two separate collections. I've therefore scrapped stepper-graphical, and moved everything back into stepper. Also, the stepper no longer needs to be tightly integrated with DrScheme itself; it can now be simply a tool. I've already done the front-end work to tie in to the new tool interface; I think this stuff is all done. So, here's the plan. The major pain is in the annotater, and that's what I'm tackling now. I'm proceeding along an iterative refinement path; first, I want to get a bare-bones annotation working, without any macro-reversal (hence source-correlation) stuff. Bindings. What's a binding? It looks to me like the syntax object representing the binding occurrence of the variable should serve admirably as a 'binding' for our purposes. *********** I'm dumping the tracking of the 'never-undefined property. It was originally used for two purposes; first, varrefs had to be wrapped with an undefined check. Second, varrefs in ankle- and cheap-wrap were not wrapped if the variables were known never to be undefined. Now, the undefined check is (at last) inserted by the language's elaborator, so the first use is obsolete. The second one is more or less obsolete as well, because I'm not sure that cheap- or ankle-wrap are ever going to be used again. Also, the 'lambda-bound-var' property is going away; in v200, I don't see a good way to get from a bound variable to its binding, which makes it more or less impossible to keep track of things by attaching properties to bindings. In fact, it doesn't really even make sense to try and find the binding for an occurrence in v200, because it's not even known. Instead, I've just added another recursion argument called 'let-bound-variables", which is basically what the property was anyway. 2002-01-08 *********** Why, for the love of God, do we need to put a wcm around a quote? I can see how we need one if there's a pre-break there, but otherwise, it seems totally useless. Ditto for quote-syntax 2002-01-08 [Later Note: This is preposterous. Of course I need a wcm there, to replace an existing one if necessary. Maybe if it's in non-tail position...] *********** Here's a nice optimization I'm not taking advantage of: the application of all lambda-bound vars doesn't need all those temp vars. OTOH, this won't help much with beginner/intermediate, because you never have a lexical var in the application position. I suppose you can generalize this to say that you only need arg-temps for things that are not lambda-bound vars. Well, maybe some other day... 2002-01-08 *********** Okay, as much as I hate to admit it, reconstruct is not just getting a face lift; it's being largely rewritten. The major change is this: I'm going to delay macro unwinding until the end. Toward this end, the recon (formerly "rectify") procedures will produce syntax objects with attached properties that record the macro expansions and the primary origin of the form. After all reconstruction is done, we go through again and look for things that need to be rewritten. This will separate the macro unwinding from the basic reconstruction of the expression. Hopefully, at the end we can just use (syntax-object->datum) to discard all of the side information. Please, let this work. Yikes. 2002-01-12 ******* There's a problem with the reconstruction of let-values, which only surfaces in the presence of multiple-values. This is okay for now, because beginner and intermediate do not allow multiple values. The problem is that if you allow expressions like this --- (let-values ([() (values)]) 3) --- that is, where there can be an empty set of variables in a lhs position, you may not be able to tell at runtime what expression you're in the middle of. The problem is that when we stop during the evaluation of a rhs in a let, we figure out which rhs we're evaluating by which lhs-vars have been changed from their original values. Oh, dang. This is totally broken for letrec's in which the rhs evaluates to the undefined value. Well, I guess I'm going to have to fix this the right way, by adding a counter to every let which is incremented explicitly after the evaluation of each rhs. Yikes. ******** Ha! Did I actually say "right way?" This is totally the _wrong_ way; keeping information about the continuation by mutating the store is guaranteed to fail when continuations are invoked. 2002-06-21 ********* Well, another year has passed. How swiftly they fly! Nathan is almost walking, Alex is almost three, and I'm about to graduate. But I'd better get the Intermediate stepper working first. A note about lifting; I keep looking for the right idiom in which to code the search for the highlight. In fact, the real problem is my inability to cleanly express the location of the highlight. The one I've settled on as the least egregious is this: a location in a syntax object is expressed as a list of context records, where each one contains an index indicating the location of the subterm. This index makes coding the search less pleasant than it might otherwise be; right now, I'm searching by constructing a list of subterms paired with indices, and then iterating through these. 2003-07-13 ********** Intermediate stepper now working. I developed a much better way of specifying the highlight: the reconstruct engine now delivers a syntax object to the display engine, which allows me to use syntax properties. Much much better. 2004-01-15 ************ A year and a half has passed since I've thought about this file, and I'm now in the midst of a Google Summer of Code (SoC) grant which is supposed to get me to support mutation, and make the corresponding changes to the interface. A thought I had while walking the California mountainsides (BTW: I've just graduated, and gotten a job at Cal Poly)--why do I do the reconstruction from the inside out? Wouldn't it be much much easier to do from the outside in? Feh. 2005-08-02 ************* Well, the dang summer is almost over, and I've still got a long, long way to go. The basic change to the model is that instead of storing completed definitions as pre-formatted s-expressions, I'm now storing them as 2-element lists containing the syntax object associated with the definition and a 'getter' which returns the value that the binding refers to. The actual definition is reformatted for each step. This is a bit silly, but it would be easy to cache the definitions along with the present values if this is actually a performance bottleneck. I suspect it won't matter a bit. In the presence of mutation, the existing separators don't make sense, either. I'm scrapping them, for the moment. A nice interface change would be to separate only the definitions that had changed. For them moment, they'll all be separated. The first order of business, after mucking around in the model for some time to get the flavor of how things will work, is to go and set up the interface so I can get things running. ************* Okay, I've "completed" the google project, but there are still things to wrap up. Right now I'm working on the highlighting for mutated bindings, which is inferred from differences in the rendered steps. So, for instance, if the left-hand-side has (define a 3), and the right-hand-side has (define a 4), well then we'd better highlight the 3 on the left and the 4 on the right, because this binding was mutated. Now, this kind of highlighting--reconstructing highlighting from observed differences, rather than obtaining direct evidence of the mutation--clearly has some shortcomings. For instance, concurrent code... well, concurrent code is all messed up to begin with; a more interesting problem occurs when you have mutations that share structure. So, what if a is mutated from (list 3 4) to (list 4 5). Should the whole thing be highlighted? Certainly that's what you'd get from a a normal reduction semantics. In some sense, though, highlighting _just_ the 3 and the 4 (and the 4 and the 5) corresponds to a smaller set of changes that produces the same result. Another problem that's coming out is the problem of "intermediate" completed expressions that arise from partially evaluated letrecs (and all the things that expand into them). These should also be scanned for mutation, right? What about the "future" ones? There are other rendering problems with forward mutation in letrecs which I haven't tackled, as well. I find myself leaning toward depending on the "user-source" syntax property. As I've observed before, though, the syntax properties form a sort of creeping mush; they don't need to be explicitly expressed as arguments or return values, and errors of omission in the syntax properties are hard to catch. A lot can hide in the "syntax?" contract. 2005-09-21 ************** Time to clean up for v300. Let's see if we can get begin and begin0 working. 2005-11-14 ************* Okay, it turns out that begin expands into a let-values with empty bindings, so I'm working on getting this going. With this addition, the annotation for 'let' is a complete monster, chewing up a substantial fraction of the annotation code all by itself. Also, I've come across a design optimization that improves if & set!, which is this: there's no reason to have if-temp & set!-temp. Putting these inline is a great improvement: it reduces code in the reconstructor, in the annotator, all over the place. The caveat: I haven't finished it yet, so who knows what kind of horrible thing will crop up. The architecture change here is that we need a new kind of break that's like a normal-break (blecch! terrible name!) but carries a value along with it. I'm going to call this the normal-break/value break. Blurrch! 2006-01-12 ************* Begin STILL isn't working. My last plan, to wrap each 'begin' body with a mark that indicates the source, is naturally broken because an inner mark can destroy that one. The solution is (hopefully) easy; just eta-expand to prevent the mark from being lost. Since it's a non-tail call, this doesn't destroy tail-calling. ************** Okay, begin now works. I decided that one of my implicit invariants--that the source expressions linked to by the marks always be actual parts of the source--was too restrictive. I've now introduced a "fake-exp" which signals an artificially constructed expression. This made all my problems go away, and now I'm a happy man. If only begin0 worked, too... 2006-11-13