Tuesday, October 18, 2005

Shame! And, a Noodle reader

Okay, Jonathan linked here so I feel obligated to actually say something more useful here. I have neglected my webloggage duties.

I've been rewriting the Noodle reader in Python instead of the lex/yacc combo, so that I can build macro characters into it. It's more straightforward to read normal Lisp with a recursive one-item-at-a-time read.

The syntactic sugar added for Pythonicness makes things just a bit more complicated, though. I went through a few different attempts at a solution before I decided on implementing trailers (. or [], as in foo.bar or foo[bar]) using a new kind of macro character called a trailer macro. When a character follows another item immediately, with no intervening space, and it's set as a trailer macro, the handling function is called with the preceding item as a parameter, and the return value is used instead of the preceding item.

In the foo[bar] case, the reader would first read "foo", and just before returning, would peek at the next character in the stream. '['- oops- it's a trailer macro! So the read_subscript function is called with 'foo' as a parameter. The subscript "bar" is read. It's not a slice, so read_subscript returns a tuple we might express in Python syntax as (Symbol(subscript), 'foo', Symbol(bar)).

One problem was that the order of operations I came up with previously does not fit nicely here. It was really pretty complicated to make things separated based on precedence in the reader. Turns out that was a rather bad idea in the first place. I've also noted when writing Noodle that it's hard to remember the order of operations, since it comes into play so rarely. My solution is to have trailers always come first, then leaders (like the quasiquote backtick (`) or the tuple/symbol-quoting backslash (\). That makes everything unambiguous, is easy to remember, and is implementable without much headache.

One more thing I came across with this is the problem with * and ** being both symbols and leaders. It just isn't clean, and in the presence of reader macros is nearly impossible to correctly implement. So the leaders for varargs and kwargs are now @ and @@, and * is just another normal symbol-name character.

The new reader is much more versatile, and once I make it play nicely with string constants (correctly handling the backslash escapes, raw strings, and Unicode escapes) it will be ready to be dropped in to Noodle proper.