Saturday, April 30, 2005

Operator precedence in Noodle

Precedence? For a language with a Lispish syntax? I'm kidding, right?

Nope. In addition to some syntactic-sugar shortcut notations like the backtick (`), common in Lisps, Noodle has two trailers: attribute access, via dot (.), and subscription, via brackets ([]). When I wanted to use Lisp on Python those were a couple things that held me back, thinking they would necessarily be inconvenient operations. I was imagining things like:

Python: object.attribute = value
Lisp: (setattr object 'attribute value)


Python: object[index]
Lisp: (subscript object index)

or even worse

Python: object[index::-1]
Lisp: (subscript object (slice index None -1))

..which is a lot of typing for something pretty common, and wouldn't be so easy to understand. But when I actually tried to fit the and foo[bar] syntaxes into Noodle, they seemed to fit really well. is changed by the parser into (getattribute foo bar). Similarly, foo[bar] becomes (subscript foo bar). Everything becomes an s-expression after the parser, so we can write everything in explicit s-expressions if we want.

Here's a short table of all the current syntactic sugar operators that get changed to s-expressions in the parser:

attribute access (getattribute foo bar)
foo[bar] (subscript foo bar)
subscription w/ slices
foo[bar:baz] (subscript foo (mkslice bar baz))
foo[:bar:baz] (subscript foo (mkslice None bar baz))
\(foo bar baz) (mktuple foo bar baz)
`(foo bar baz) (quasiquote (foo bar baz))
`foo (quasiquote foo)
,bar (unquote bar)
`(foo (bar ,baz) (quasiquote (foo (bar (unquote baz))))
keyword arguments
(myfunc foo:bar) (myfunc (mkkeyword foo bar))
(myfunc foo bar:(baz)) (myfunc foo (mkkeyword bar (baz)))
list building
[5 4 3 9 foo] (mklist 5 4 3 9 foo)
dictionary building
{foo:bar baz: boom} (mkdict foo bar baz boom)
(myfunc foo *bar) (myfunc foo (mkvarargs bar))
(myfunc foo **bar) (myfunc foo (mkkwargs bar))

If you're paying attention, you may wonder how subscripts are differentiated from lists. Quite simple- subscripts follow an item immediately, with no intervening space. So foo[bar] is a subscript, but foo [bar] is two items; foo and a list containing bar.

You might also be asking about assignments to an attribute or subscript, or (if you're really familiar with Python details) how you would differentiate between foo[3:10] and foo[slice(3, 10)], which generate different bytecode and can in some cases give different results. The answers to those are not quite as simple to explain, but they are simple to implement. When I do something like assignment: (= baz), that's handled by the "=" bytecode macro. The arguments to a macro are not evaluated beforehand- macros receive the tuples, symbols, and constants from the program code. So the = macro gets two arguments: (getattribute foo bar) and baz. Since there is no meaningful reason to give (getattribute foo bar) as the first parameter to = other than trying to assign to the bar attribute of foo, the = macro can recognize it and handle it specially. The same thing happens for (subscript), (mkslice), (mkvarargs), (mkkwargs), (mkkeyword), etc; macros can see and recognize them in their arguments and handle them the right way.

The only thing left to worry about is precedence, in a few cases where these syntactic sugar bits lead to confusion, like this:


In almost all cases like this, there's only one meaningful way to interpret the expression, which leads to an order of operations:

1. quasiquoting, unquoting, tuple-escaping
2. trailers (. and []), left to right
3. colons, stars, or doublestars (can't be used together, so order among them is meaningless)

So the above becomes

(quasiquote ((subscript (getattribute (unquote foo) baz) bar)))

If that order of operations is ever not the desired order, the programmer is free to use the equivalent s-expressions, which take away any ambiguity.

I'm pretty happy with all of this. It does leave one wart on the language I don't like: I have to leave a special case in the syntax for * and ** to be usable as symbols (for multiplication and power). So it's necessarily impossible to use * or ** (the syntactic sugar items) on * or ** (the symbols). If a programmer wants to do this, she'd need to use s-expressions. I think it's a restriction people can live with, but it's still a fairly dramatic special case.

As always, I would love comments on any of this, especially if I've done something stupid. Fixing the language before it's publicly released is a bit easier than afterward.

Thursday, April 28, 2005

Writing scanners and parsers in Python

In writing Noodle, I've spent a good deal of time looking around for decent Python-oriented scanner and parser tools/generators. I ended up writing my own scanner (since the needs were not very complex) and using standalone from the PLY project.

But since that time, my friend Travis Hartwell pointed me to this (discussion on the undocumented and "experimental" sre.Scanner stuff in the Python standard library, which looks perfect) and this, a summary of Python parsing tools which I somehow completely missed in my search.

Through that, I came to PyBison, which looks more than a little interesting, and to which I will certainly be migrating, barring unforeseen difficulties. It's under the GPL, and I plan to have Noodle under an MIT-style license, but one would expect the generated code need not be GPL'd. It will depend on the author's wishes and how much nontrivial glue code is emitted with the generated output.

As far as I understand, PyBison uses Python docstrings to create input to bison, runs bison, and puts in some extra glue to get results available to Python again. That may save me a lot of work, since I was planning to do the bison stuff directly.

Next Monday Novell's Open Source Review Board will meet with Noodle on the agenda. I work at Novell and want to make sure the ownership status of Noodle is in the clear before releasing anything.

Wednesday, April 20, 2005

Python for Lispers

I recently found myself reading Peter Norvig's essay on Python for Lisp Programmers. It makes a good comparison of Python and Common Lisp, although it's a little out of date. I'm happy to note that Noodle fixes quite a few of the shortcomings of Python as listed there. (Noodle won't be any faster than Python, but fortunately Python's been making some very good progress in that area with recent releases.)

Once Noodle is ready, I plan on coming up with a similar document showing Noodle in between Lisp and Python.

Tuesday, April 19, 2005

Programming languages

I enjoy programming. My brother taught me BASIC when I was 8 years old, and I've been finding ways to make the computer do what I want ever since. And I've always been looking for a better language; one in which complex ideas and plans can be expressed simply and which provides the best transport between thoughts and bytes.

I've been using Python more than anything else for quite a while now, and it's been difficult to pin down exactly what it is about Python that I like so much. I think a big part of it is the "batteries included" philosophy: there is a very powerful and comprehensive standard library present for every Python installation, so programs are easy to move around, and I know exactly where to look for tools and the documentation on those tools, and I know the documentation is going to be good. It's helped me learn Python much faster than any other language, and get to know more of the details that let me make best use of the language.

I'm big on Python's DuckTyping too. Python makes working with objects and classes easy.

Still, there are things missing from Python. I've messed around with various Lisp dialects over the years, and really enjoyed the constructs like lambdas and macros, and the way everything returns a value (even useful values, much of the time). Python has a distinction between statements and expressions that seems fairly artificial to me. That rigidity sometimes blocks me from writing my programs in the way I want them to be. Python lambdas, for example, can only contain expressions, due apparently to syntax constraints; and there are formatting and whitespace issues that are forced on to the code. The lack of macros grates sometimes. And Python's constructs (like assignment, and function definitions, and loops) can't return values, necessitating a few more steps in code than might otherwise be necessary. Now, I fully agree that most of the time, these issues encourage better and more readable code. But not all the time. I'd like a little extra freedom in some cases.

There are still a lot of languages out there to try. OCaml and Haskell probably stand out as the next ones I need to learn. But I'm just a bit too attached to Python's standard library and clean extension API and a bunch of other things, that I was motivated to try putting a lispish syntax on top of Python.

I looked at Logix, and I tried really hard to like it, but it felt too clumsy. It apparently translates Logix code to valid Python underneath, so it has to go some extra lengths to avoid the same shortcomings as Python.

So I've written my own language, that I call Noodle, which has a lispish syntax and compiles to Python bytecode. Its lambdas are not restricted, and it has macros, and things like function definitions return values. What's more, it still makes attribute usage and access simple--something I haven't seen any lisp dialect do.

It's coming along very well. It's almost to the point where I can make the first public release. I still need to implement defmacro, allow Noodle modules to be imported as well as .py files, and clean up a few other odds and ends.

I will probably want to get a decent start on the documentation too, before releasing. Or maybe not; maybe I can point interested users to the current test suite and let them look up syntax from there. Then maybe others can help with the documentation before the next release.