500lines - 500 Lines or Less
For the upcoming “500 lines” book I’m writing a Python compiler in Python: it compiles a Python abstract-syntax tree into Python bytecode. When I promised this work there was plenty of time; then between accepting security-consulting work and getting almost nothing done over a couple of months in Buenos Aires – net access, power, my laptop, and my keyboard all failed there – I had to start back into this about 10 days before the deadline. How’s it going?
Not awfully: almost enough Python is supported for the compiler to compile itself. While none of the design goals is satisfied, they look like they’re in reach. (But I’m also too busy with the paid work.)
Just tonight I found out Python 2 comes with its very own
Python-in-Python compiler, in the compiler
module. Python’s real
compiler is in C, so there seems to be no reason for this except that
it’s cool. OK! This could help, even given it’s for an older version
of Python. Let’s peek at how we do our different things. How about the
while
statement – here’s mine:
def visit_While(self, t):
return {0: [op.SETUP_LOOP(3)],
1: [self(t.test), op.POP_JUMP_IF_FALSE(2),
self(t.body), op.JUMP_ABSOLUTE(1)],
2: [op.POP_BLOCK],
3: []}
and theirs:
def visitWhile(self, node):
self.set_lineno(node)
loop = self.newBlock()
else_ = self.newBlock()
after = self.newBlock()
self.emit('SETUP_LOOP', after)
self.nextBlock(loop)
self.setups.push((LOOP, loop))
self.set_lineno(node, force=True)
self.visit(node.test)
self.emit('POP_JUMP_IF_FALSE', else_ or after)
self.nextBlock()
self.visit(node.body)
self.emit('JUMP_ABSOLUTE', loop)
self.startBlock(else_) # or just the POPs if not else clause
self.emit('POP_BLOCK')
self.setups.pop()
if node.else_:
self.visit(node.else_)
self.nextBlock(after)
They both build a network of blocks of bytecode instructions, to be
assembled by a later pass into a bytecode sequence. The latter code
does more: it propagates line-number info, it maintains a stack of
loop contexts for the sake of compiling the continue
statement (and
maybe other purposes), and it handles the optional else
branch after
a while
. Finally, it represents the control-flow graph in a more
imperative style. I intend to also remember line numbers, and I’d like
to support continue
even though you can perfectly well write a
compiler without it – we’ll see; while else
will be an afterthought
if I ever get to it within the 500-line budget.
(The arguments to op.SETUP_LOOP
and the JUMP
ops are symbolic
labels as defined by the dict surrounding them. My first cut at coding
this used no assembler or control-flow graph at all, directly
composing bytecode strings. While this works well for bytecodes with
relative addresses, Python uses a mix of relative and absolute, for no
reason I can tell. You could hack around this with fake bytecodes that
get rewritten into the real ones in an afterpass, but ugh.)
The Python 2 compiler takes 2853 lines (counting blanks and comments), about 6 times my budget; in this example we’re at about ¼ the length. Well, maybe I’ll end up redefining success. Onward!
(ourobouros.py is not the official name; there isn’t one.)