Saturday, April 30, 2005

Operator precedence in Noodle

Precedence? For a language with a Lispish syntax? I'm kidding, right?

Nope. In addition to some syntactic-sugar shortcut notations like the backtick (`), common in Lisps, Noodle has two trailers: attribute access, via dot (.), and subscription, via brackets ([]). When I wanted to use Lisp on Python those were a couple things that held me back, thinking they would necessarily be inconvenient operations. I was imagining things like:

Python: object.attribute = value
Lisp: (setattr object 'attribute value)

or

Python: object[index]
Lisp: (subscript object index)

or even worse

Python: object[index::-1]
Lisp: (subscript object (slice index None -1))

..which is a lot of typing for something pretty common, and wouldn't be so easy to understand. But when I actually tried to fit the foo.bar and foo[bar] syntaxes into Noodle, they seemed to fit really well.

foo.bar is changed by the parser into (getattribute foo bar). Similarly, foo[bar] becomes (subscript foo bar). Everything becomes an s-expression after the parser, so we can write everything in explicit s-expressions if we want.

Here's a short table of all the current syntactic sugar operators that get changed to s-expressions in the parser:


attribute access
foo.bar (getattribute foo bar)
subscription
foo[bar] (subscript foo bar)
subscription w/ slices
foo[bar:baz] (subscript foo (mkslice bar baz))
foo[:bar:baz] (subscript foo (mkslice None bar baz))
tuple-quoting
\(foo bar baz) (mktuple foo bar baz)
quasiquoting
`(foo bar baz) (quasiquote (foo bar baz))
`foo (quasiquote foo)
unquoting
,bar (unquote bar)
`(foo (bar ,baz) (quasiquote (foo (bar (unquote baz))))
keyword arguments
(myfunc foo:bar) (myfunc (mkkeyword foo bar))
(myfunc foo bar:(baz)) (myfunc foo (mkkeyword bar (baz)))
list building
[5 4 3 9 foo] (mklist 5 4 3 9 foo)
dictionary building
{foo:bar baz: boom} (mkdict foo bar baz boom)
varargs
(myfunc foo *bar) (myfunc foo (mkvarargs bar))
kwargs
(myfunc foo **bar) (myfunc foo (mkkwargs bar))


If you're paying attention, you may wonder how subscripts are differentiated from lists. Quite simple- subscripts follow an item immediately, with no intervening space. So foo[bar] is a subscript, but foo [bar] is two items; foo and a list containing bar.

You might also be asking about assignments to an attribute or subscript, or (if you're really familiar with Python details) how you would differentiate between foo[3:10] and foo[slice(3, 10)], which generate different bytecode and can in some cases give different results. The answers to those are not quite as simple to explain, but they are simple to implement. When I do something like assignment: (= foo.bar baz), that's handled by the "=" bytecode macro. The arguments to a macro are not evaluated beforehand- macros receive the tuples, symbols, and constants from the program code. So the = macro gets two arguments: (getattribute foo bar) and baz. Since there is no meaningful reason to give (getattribute foo bar) as the first parameter to = other than trying to assign to the bar attribute of foo, the = macro can recognize it and handle it specially. The same thing happens for (subscript), (mkslice), (mkvarargs), (mkkwargs), (mkkeyword), etc; macros can see and recognize them in their arguments and handle them the right way.

The only thing left to worry about is precedence, in a few cases where these syntactic sugar bits lead to confusion, like this:

`(,foo.baz[bar])

In almost all cases like this, there's only one meaningful way to interpret the expression, which leads to an order of operations:

1. quasiquoting, unquoting, tuple-escaping
2. trailers (. and []), left to right
3. colons, stars, or doublestars (can't be used together, so order among them is meaningless)

So the above becomes

(quasiquote ((subscript (getattribute (unquote foo) baz) bar)))

If that order of operations is ever not the desired order, the programmer is free to use the equivalent s-expressions, which take away any ambiguity.

I'm pretty happy with all of this. It does leave one wart on the language I don't like: I have to leave a special case in the syntax for * and ** to be usable as symbols (for multiplication and power). So it's necessarily impossible to use * or ** (the syntactic sugar items) on * or ** (the symbols). If a programmer wants to do this, she'd need to use s-expressions. I think it's a restriction people can live with, but it's still a fairly dramatic special case.

As always, I would love comments on any of this, especially if I've done something stupid. Fixing the language before it's publicly released is a bit easier than afterward.

0 comments: