Hack a doodle, do.

By Darius Bacon

ouroboros.py

For the upcoming “500 lines” book I’m writing a Python compiler in Python: it compiles a Python abstract-syntax tree into Python bytecode. When I promised this work there was plenty of time; then between accepting security-consulting work and getting almost nothing done over a couple of months in Buenos Aires – net access, power, my laptop, and my keyboard all failed there – I had to start back into this about 10 days before the deadline. How’s it going?

Not awfully: almost enough Python is supported for the compiler to compile itself. While none of the design goals is satisfied, they look like they’re in reach. (But I’m also too busy with the paid work.)

Just tonight I found out Python 2 comes with its very own Python-in-Python compiler, in the compiler module. Python’s real compiler is in C, so there seems to be no reason for this except that it’s cool. OK! This could help, even given it’s for an older version of Python. Let’s peek at how we do our different things. How about the while statement – here’s mine:

def visit_While(self, t):
    return {0: [op.SETUP_LOOP(3)],
            1: [self(t.test), op.POP_JUMP_IF_FALSE(2),
                self(t.body), op.JUMP_ABSOLUTE(1)],
            2: [op.POP_BLOCK],
            3: []}

and theirs:

def visitWhile(self, node):
    self.set_lineno(node)

    loop = self.newBlock()
    else_ = self.newBlock()

    after = self.newBlock()
    self.emit('SETUP_LOOP', after)

    self.nextBlock(loop)
    self.setups.push((LOOP, loop))

    self.set_lineno(node, force=True)
    self.visit(node.test)
    self.emit('POP_JUMP_IF_FALSE', else_ or after)

    self.nextBlock()
    self.visit(node.body)
    self.emit('JUMP_ABSOLUTE', loop)

    self.startBlock(else_) # or just the POPs if not else clause
    self.emit('POP_BLOCK')
    self.setups.pop()
    if node.else_:
        self.visit(node.else_)
    self.nextBlock(after)

They both build a network of blocks of bytecode instructions, to be assembled by a later pass into a bytecode sequence. The latter code does more: it propagates line-number info, it maintains a stack of loop contexts for the sake of compiling the continue statement (and maybe other purposes), and it handles the optional else branch after a while. Finally, it represents the control-flow graph in a more imperative style. I intend to also remember line numbers, and I’d like to support continue even though you can perfectly well write a compiler without it – we’ll see; while else will be an afterthought if I ever get to it within the 500-line budget.

(The arguments to op.SETUP_LOOP and the JUMP ops are symbolic labels as defined by the dict surrounding them. My first cut at coding this used no assembler or control-flow graph at all, directly composing bytecode strings. While this works well for bytecodes with relative addresses, Python uses a mix of relative and absolute, for no reason I can tell. You could hack around this with fake bytecodes that get rewritten into the real ones in an afterpass, but ugh.)

The Python 2 compiler takes 2853 lines (counting blanks and comments), about 6 times my budget; in this example we’re at about ¼ the length. Well, maybe I’ll end up redefining success. Onward!

(ourobouros.py is not the official name; there isn’t one.)

  • 24 February 2014