Tuesday, October 18, 2005

Shame! And, a Noodle reader

Okay, Jonathan linked here so I feel obligated to actually say something more useful here. I have neglected my webloggage duties.

I've been rewriting the Noodle reader in Python instead of the lex/yacc combo, so that I can build macro characters into it. It's more straightforward to read normal Lisp with a recursive one-item-at-a-time read.

The syntactic sugar added for Pythonicness makes things just a bit more complicated, though. I went through a few different attempts at a solution before I decided on implementing trailers (. or [], as in foo.bar or foo[bar]) using a new kind of macro character called a trailer macro. When a character follows another item immediately, with no intervening space, and it's set as a trailer macro, the handling function is called with the preceding item as a parameter, and the return value is used instead of the preceding item.

In the foo[bar] case, the reader would first read "foo", and just before returning, would peek at the next character in the stream. '['- oops- it's a trailer macro! So the read_subscript function is called with 'foo' as a parameter. The subscript "bar" is read. It's not a slice, so read_subscript returns a tuple we might express in Python syntax as (Symbol(subscript), 'foo', Symbol(bar)).

One problem was that the order of operations I came up with previously does not fit nicely here. It was really pretty complicated to make things separated based on precedence in the reader. Turns out that was a rather bad idea in the first place. I've also noted when writing Noodle that it's hard to remember the order of operations, since it comes into play so rarely. My solution is to have trailers always come first, then leaders (like the quasiquote backtick (`) or the tuple/symbol-quoting backslash (\). That makes everything unambiguous, is easy to remember, and is implementable without much headache.

One more thing I came across with this is the problem with * and ** being both symbols and leaders. It just isn't clean, and in the presence of reader macros is nearly impossible to correctly implement. So the leaders for varargs and kwargs are now @ and @@, and * is just another normal symbol-name character.

The new reader is much more versatile, and once I make it play nicely with string constants (correctly handling the backslash escapes, raw strings, and Unicode escapes) it will be ready to be dropped in to Noodle proper.

Thursday, July 14, 2005

Not Dead

Still doing some work on Noodle, in between other stuff, and it is slowly getting closer to a release. I'm not going to make myself write all the possible unit tests beforehand, and the docs will be minimal at release, so I've made the goals a bit easier to reach. Possibly others will help with the tests and docs and so on once it's released. Not that I'm depending on that--I'll still do it myself if necessary, since this project is interesting enough to me.

I do want to finish an implementation of eval-when before the release, though. That will clear up some things about what gets run when and hopefully keep the macros' environment from being confusing.

Monday, June 06, 2005

In Other News

I've been spending a bit of time on another marriage of Python and Lisp- embedding Python in sbcl, along with convenience functions for creating Python objects, accessing them, and so on. My friend Travis Hartwell started this by embedding Python using UFFI- enough to allow execution of a string of Python code- in only a few lines of Lisp. I was impressed and jumped in to help add stuff.

It's too early to say whether he'll want to release this publicly, or just make it a learning experience. We've got the distinct impression it's not of much interest to the Lisp community. Apparently the argument is that everything is so easy to do in Common Lisp, there is no need for a good comprehensive set of libraries and utilities. I don't know about that myself. Still, maybe other Pythonistas could make some use of it.

Tuesday, May 31, 2005

Progress Update

For the few people who already know about Noodle and are waiting for the release: It doesn't look like I'll make the expected June 1st date. But over the holiday weekend (and in the course of a road trip to see in-laws in Sacramento) I was able to get a lot of good work done on Noodle.

There were some elusive problems caused by assumptions made by Python bytecodes. See, in Python, the stack is empty between each statement. But in Noodle, everything is an expression, so sometimes there will be things on the stack when they start. Most of the time, this doesn't cause a problem. But with things like Python's CONTINUE_LOOP and BREAK_LOOP, it makes big problems. They both simply wipe out the stack when called.

So, Noodle's continue and break special-operators [1] associated with loops now keep track of the stack state when they are created, and when called, POP_TOP items off the stack to get to that point. They also keep track of exception blocks and pop them off (POP_BLOCK) when appropriate. Finally, loops are implemented without the SETUP_LOOP construct, since it's only useful for making the CONTINUE_LOOP and BREAK_LOOP ops work. So continue and break have to manually jump to the correct place. For all this to work, Noodle needs to keep much stricter track of the exact predicted stacklevel for each individual opcode than it did before. But tightening that up has helped me to find one or two extra bugs that are now fixed. And these changes should make it simple to add support for things like (break 2) or (break-myloop) for breaking out of loops other than the innermost.

Distutils-ization is done, and there are more standard macros, and the Bison/Flex parser and scanner are working wonderfully. Along with that, compilation is faster, the error messages for failed parses and scans are more descriptive, and the lnotabs (line number tables) are much more accurate.

So I'm closer to having Noodle ready for a release (by which I mean, Noodle is almost to the point where I'm not as worried about having everyone mock me when they see it :).

[1] I'm not sure whether continue and break should be considered special operators, since they are defined and put into the macro dict when inside loops, by the special operators for loops. But they are defined in Python and are otherwise just like special operators.

Saturday, May 21, 2005

Practical Common Lisp

I have now had the opportunity to peruse the online pages of Peter Seibel's excellent work, Practical Common Lisp. I'm not quite finished yet, but it's been absolutely perfect for filling in the gaps in my knowledge recently lamented. I recommend it as strongly as I am able to anyone wanting to learn more about Lisp- and I'm not usually one to enjoy or recommend books on technical topics. I may even take the unusual step of buying the dead-tree edition. Thanks, Mr. Seibel.

Tuesday, May 17, 2005

Anonymous blocks (PEP 340)

PEP (Python Enhancement Proposal) 340 suggests adding a new block construct to the Python language which would "provide a mechanism for encapsulating patterns of structure". I don't understand the details of the whole proposal just yet, but it looks like the point is making it simpler to execute a blocks of code between other steps, without having to put all the other steps into the code individually. For example, from the PEP, a way to execute some commands while a file is open and guarantee it will be closed when done:
def opening(filename, mode="r"):
f = open(filename, mode)
try:
yield f
finally:
f.close()

Used as follows:
block opening("/etc/passwd") as f:
for line in f:
print line.rstrip()

Or to retry something several times:
def auto_retry(n=3, exc=Exception):
for i in range(n):
try:
yield None
return
except exc, err:
# perhaps log exception here
continue
raise # re-raise the exception we caught earlier

Used as follows:
block auto_retry(3, IOError):
f = urllib.urlopen("http://python.org/peps/pep-0340.html")
print f.read()

This seems to me to be trying to solve a problem long ago solved by Lisp's macros, which coincidentally also solve a plethora of other problems unaddressed by the proposed block facility. I'm happy to say Noodle should never need block. (Of course, if a programmer does need it, she can implement it as a macro pretty quickly!)

I see more and more how the capability of extending Lisp's syntax has made it so powerful and so capable of remaining competetive after being around for 50 years.

Wednesday, May 11, 2005

Todo list

The list of items to be completed in Noodle before the public alpha release (at 0.1.0) is continuing to dwindle. Some of the bigger items currently there include:


  • Sufficient documentation for easy learning and usage of Noodle

  • Larger suite of acceptance tests

  • Rewrite parser (and possibly scanner as well) to be in C, using bison or PyBison. Bison is much more powerful, faster, etc., makes it simpler to define and handle error conditions, and won't require GPL code to be distributed with Noodle as does PLY. (I plan to release Noodle under an MIT/X11-style license)

  • Errors from scanner should indicate position in text

  • Class docstrings

  • Disallow some code arrangements that will make the Python VM upset-

    • No using unqualified 'exec' when there are any cell or free variables

    • No using import-all-from (like "from foo import *") except at the module level


  • Macro called get-module which will be used to implement "import foo as bar" functionality

  • Conditionals in for loops (including list comprehensions and generator expressions): (for (foo in bar if baz) do-something)

  • Consider adding syntax for what I call Python's multidimensional for loops (no idea what they're really called): i.e.,
    [t for foo in bar for t in foo]

  • Function argument unpacking, as when a function is declared with tuples in the argument list: i.e.,
    (define (my-function foo (bar (baz bonk) boom))
    function-body)

  • Define builtin functions to complement certain builtin macros; especially those which might be useful to pass to map, filter, etc, or might need to accept *varargs arguments. For example, the + macro.

  • Fix a bug with nested quasiquote calls

  • Add a syntax for the functionality offered by Python's print chevron

  • Tweaks to the Vim syntax-highlighting definition


Most of these are fairly simple and won't take much time. Of course everything depends on available time, but I'm planning on an initial public release June 1st.

Friday, May 06, 2005

Distractions

I found the Python Challenge a couple days ago and haven't been able to do any work on Noodle as a result. The puzzles are much too engaging! Hard, too. I've just started on #16, and I don't know how many more there are, but hopefully I run out soon, because when the problems are right there waiting to be solved, I can't ignore them!

I have a suspicion it was put up by Perl enthusiasts (or maybe PHP) to distract the Python programmers and cut down on development and maintenance time.

Monday, May 02, 2005

Release approved

I met with Novell's Open Source Review Board to make sure the Noodle code was in the clear. The process may not have been completely necessary, but I wanted to be safe down the road, just in case. Most likely simply paranoia.

So Noodle is now approved for release. The existing code will have a "Copyright (C) 2005 by Novell, Inc." added but I've got it under the MIT license, so that doesn't bother me at all. I'm glad to see Novell's name on free software projects anyway.

The existing code uses yacc.py from the PLY project, as I may have mentioned before, and that's under the GPL. So the whole package as it currently exists is only distributable under the GPL. I'm planning to replace yacc.py with a PyBison parser before too long, so the impending public alpha release of Noodle may be GPL'd, or it may be under the MIT license.

Tail recursion

Noodle can now do limited tail call optimization. Only the tail recursion case is optimized in this way. That is, when a function calls itself and immediately returns the result of that call, Noodle can transform the bytecode so that the new function call doesn't take up any extra stack space. The initial function's stack space is no longer needed, once the arguments to the function call are computed, so there's no reason to keep it there. This is accomplished by transforming the function call into a series of assignments, one for each input variable, and simply jumping to the top of the function again.

The most difficult part of this is not the transformation- it's determining under which cases those steps are actually valid. When the tail-recursive call uses keyword arguments or varargs (*) or kwargs (**), for example, it's trickier to come up with valid bytecode to correctly reinitialize the function. Default arguments cause another problem, since a code object has no way of knowing which default arguments are attached to it in a containing function object.

So the current requirements for taking advantage of Noodle's tail call optimization are thus:

  • Tail call must be at the end of the function, in addition to being in a tail position (this restriction is expected to go away at some point)

  • Tail call must be to the current function, by the same name

  • Tail call must give arguments for all positional parameters and all default parameters, in order, and must not give any keyword arguments, varargs, or kwargs.


When you're writing a recursive function, these restrictions should in general be rather simple to get around.

I don't expect Noodle will be able to support any further tail call optimization, since Python bytecode is currently incapable of jumping between code objects. Jumping can only be done to elsewhere in the current code object.

Saturday, April 30, 2005

Operator precedence in Noodle

Precedence? For a language with a Lispish syntax? I'm kidding, right?

Nope. In addition to some syntactic-sugar shortcut notations like the backtick (`), common in Lisps, Noodle has two trailers: attribute access, via dot (.), and subscription, via brackets ([]). When I wanted to use Lisp on Python those were a couple things that held me back, thinking they would necessarily be inconvenient operations. I was imagining things like:

Python: object.attribute = value
Lisp: (setattr object 'attribute value)

or

Python: object[index]
Lisp: (subscript object index)

or even worse

Python: object[index::-1]
Lisp: (subscript object (slice index None -1))

..which is a lot of typing for something pretty common, and wouldn't be so easy to understand. But when I actually tried to fit the foo.bar and foo[bar] syntaxes into Noodle, they seemed to fit really well.

foo.bar is changed by the parser into (getattribute foo bar). Similarly, foo[bar] becomes (subscript foo bar). Everything becomes an s-expression after the parser, so we can write everything in explicit s-expressions if we want.

Here's a short table of all the current syntactic sugar operators that get changed to s-expressions in the parser:


attribute access
foo.bar (getattribute foo bar)
subscription
foo[bar] (subscript foo bar)
subscription w/ slices
foo[bar:baz] (subscript foo (mkslice bar baz))
foo[:bar:baz] (subscript foo (mkslice None bar baz))
tuple-quoting
\(foo bar baz) (mktuple foo bar baz)
quasiquoting
`(foo bar baz) (quasiquote (foo bar baz))
`foo (quasiquote foo)
unquoting
,bar (unquote bar)
`(foo (bar ,baz) (quasiquote (foo (bar (unquote baz))))
keyword arguments
(myfunc foo:bar) (myfunc (mkkeyword foo bar))
(myfunc foo bar:(baz)) (myfunc foo (mkkeyword bar (baz)))
list building
[5 4 3 9 foo] (mklist 5 4 3 9 foo)
dictionary building
{foo:bar baz: boom} (mkdict foo bar baz boom)
varargs
(myfunc foo *bar) (myfunc foo (mkvarargs bar))
kwargs
(myfunc foo **bar) (myfunc foo (mkkwargs bar))


If you're paying attention, you may wonder how subscripts are differentiated from lists. Quite simple- subscripts follow an item immediately, with no intervening space. So foo[bar] is a subscript, but foo [bar] is two items; foo and a list containing bar.

You might also be asking about assignments to an attribute or subscript, or (if you're really familiar with Python details) how you would differentiate between foo[3:10] and foo[slice(3, 10)], which generate different bytecode and can in some cases give different results. The answers to those are not quite as simple to explain, but they are simple to implement. When I do something like assignment: (= foo.bar baz), that's handled by the "=" bytecode macro. The arguments to a macro are not evaluated beforehand- macros receive the tuples, symbols, and constants from the program code. So the = macro gets two arguments: (getattribute foo bar) and baz. Since there is no meaningful reason to give (getattribute foo bar) as the first parameter to = other than trying to assign to the bar attribute of foo, the = macro can recognize it and handle it specially. The same thing happens for (subscript), (mkslice), (mkvarargs), (mkkwargs), (mkkeyword), etc; macros can see and recognize them in their arguments and handle them the right way.

The only thing left to worry about is precedence, in a few cases where these syntactic sugar bits lead to confusion, like this:

`(,foo.baz[bar])

In almost all cases like this, there's only one meaningful way to interpret the expression, which leads to an order of operations:

1. quasiquoting, unquoting, tuple-escaping
2. trailers (. and []), left to right
3. colons, stars, or doublestars (can't be used together, so order among them is meaningless)

So the above becomes

(quasiquote ((subscript (getattribute (unquote foo) baz) bar)))

If that order of operations is ever not the desired order, the programmer is free to use the equivalent s-expressions, which take away any ambiguity.

I'm pretty happy with all of this. It does leave one wart on the language I don't like: I have to leave a special case in the syntax for * and ** to be usable as symbols (for multiplication and power). So it's necessarily impossible to use * or ** (the syntactic sugar items) on * or ** (the symbols). If a programmer wants to do this, she'd need to use s-expressions. I think it's a restriction people can live with, but it's still a fairly dramatic special case.

As always, I would love comments on any of this, especially if I've done something stupid. Fixing the language before it's publicly released is a bit easier than afterward.

Thursday, April 28, 2005

Writing scanners and parsers in Python

In writing Noodle, I've spent a good deal of time looking around for decent Python-oriented scanner and parser tools/generators. I ended up writing my own scanner (since the needs were not very complex) and using yacc.py standalone from the PLY project.

But since that time, my friend Travis Hartwell pointed me to this (discussion on the undocumented and "experimental" sre.Scanner stuff in the Python standard library, which looks perfect) and this, a summary of Python parsing tools which I somehow completely missed in my search.

Through that, I came to PyBison, which looks more than a little interesting, and to which I will certainly be migrating, barring unforeseen difficulties. It's under the GPL, and I plan to have Noodle under an MIT-style license, but one would expect the generated code need not be GPL'd. It will depend on the author's wishes and how much nontrivial glue code is emitted with the generated output.

As far as I understand, PyBison uses Python docstrings to create input to bison, runs bison, and puts in some extra glue to get results available to Python again. That may save me a lot of work, since I was planning to do the bison stuff directly.




Next Monday Novell's Open Source Review Board will meet with Noodle on the agenda. I work at Novell and want to make sure the ownership status of Noodle is in the clear before releasing anything.

Wednesday, April 20, 2005

Python for Lispers

I recently found myself reading Peter Norvig's essay on Python for Lisp Programmers. It makes a good comparison of Python and Common Lisp, although it's a little out of date. I'm happy to note that Noodle fixes quite a few of the shortcomings of Python as listed there. (Noodle won't be any faster than Python, but fortunately Python's been making some very good progress in that area with recent releases.)

Once Noodle is ready, I plan on coming up with a similar document showing Noodle in between Lisp and Python.

Tuesday, April 19, 2005

Programming languages

I enjoy programming. My brother taught me BASIC when I was 8 years old, and I've been finding ways to make the computer do what I want ever since. And I've always been looking for a better language; one in which complex ideas and plans can be expressed simply and which provides the best transport between thoughts and bytes.

I've been using Python more than anything else for quite a while now, and it's been difficult to pin down exactly what it is about Python that I like so much. I think a big part of it is the "batteries included" philosophy: there is a very powerful and comprehensive standard library present for every Python installation, so programs are easy to move around, and I know exactly where to look for tools and the documentation on those tools, and I know the documentation is going to be good. It's helped me learn Python much faster than any other language, and get to know more of the details that let me make best use of the language.

I'm big on Python's DuckTyping too. Python makes working with objects and classes easy.

Still, there are things missing from Python. I've messed around with various Lisp dialects over the years, and really enjoyed the constructs like lambdas and macros, and the way everything returns a value (even useful values, much of the time). Python has a distinction between statements and expressions that seems fairly artificial to me. That rigidity sometimes blocks me from writing my programs in the way I want them to be. Python lambdas, for example, can only contain expressions, due apparently to syntax constraints; and there are formatting and whitespace issues that are forced on to the code. The lack of macros grates sometimes. And Python's constructs (like assignment, and function definitions, and loops) can't return values, necessitating a few more steps in code than might otherwise be necessary. Now, I fully agree that most of the time, these issues encourage better and more readable code. But not all the time. I'd like a little extra freedom in some cases.

There are still a lot of languages out there to try. OCaml and Haskell probably stand out as the next ones I need to learn. But I'm just a bit too attached to Python's standard library and clean extension API and a bunch of other things, that I was motivated to try putting a lispish syntax on top of Python.

I looked at Logix, and I tried really hard to like it, but it felt too clumsy. It apparently translates Logix code to valid Python underneath, so it has to go some extra lengths to avoid the same shortcomings as Python.

So I've written my own language, that I call Noodle, which has a lispish syntax and compiles to Python bytecode. Its lambdas are not restricted, and it has macros, and things like function definitions return values. What's more, it still makes attribute usage and access simple--something I haven't seen any lisp dialect do.

It's coming along very well. It's almost to the point where I can make the first public release. I still need to implement defmacro, allow Noodle modules to be imported as well as .py files, and clean up a few other odds and ends.

I will probably want to get a decent start on the documentation too, before releasing. Or maybe not; maybe I can point interested users to the current test suite and let them look up syntax from there. Then maybe others can help with the documentation before the next release.