3

Python docs on the dis module state that the length of a python bytecode instruction in CPython is 2 bytes (https://docs.python.org/3/library/dis.html)

However, when I disassemble a function and look at the actual bytecode, I see that the instructions only take up 1 byte:

def example_function(x):
    if x > 0:
        return x + 1
    return 0


print(example_function.__code__.co_code.hex(' '))

# 97 00 7c 00 64 01 6b 44 00 00 72 05 7c 00 64 02 7a 00 00 00 53 00 79 01

print(*example_function.__code__.co_code)

# 151 0 116 1 0 0 0 0 0 0 0 0 106 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124 0 171 1 0 0 0 0 0 0 83 0


dis.dis(example_function)

 10           0 RESUME                   0

 11           2 LOAD_FAST                0 (x)
              4 LOAD_CONST               1 (0)
              6 COMPARE_OP              68 (>)
             10 POP_JUMP_IF_FALSE        5 (to 22)

 12          12 LOAD_FAST                0 (x)
             14 LOAD_CONST               2 (1)
             16 BINARY_OP                0 (+)
             20 RETURN_VALUE

 13     >>   22 RETURN_CONST             1 (0)

Are they variable in length?

The instruction list defined in dis.opname does have a few instructions longer than 1 byte, but how does then the interpreter know when it's 1 byte and when it's 2 bytes?

As you can see above, I've tried disassembling the code to see for myself and inspected the docs.

11
  • 2
    The interpreter must know from the first byte whether there is another byte (or more). Commented Aug 7 at 7:39
  • 1
    See docs.python.org/3/library/dis.html for details. The 2nd column is the byte offset. Each instruction takes 2 bytes; the reason there is a jump of 4 between COMPARE_OP and POP_JUMP_IF_FALSE i assume is due to a hidden CACHE instruction in between. if you pass show_caches=True to the dis() function, you'll see the CACHE instructions. Commented Aug 7 at 9:54
  • @JEarls If it was only that easy. What dis.dis gives you is an already digested bytecode in a human-readable format. That actual bytecode in my example is the following: "97007c0064016b44000072057c0064027a00000053007901" - it is in hex and hence 97 corresponds to the instruction RESUME. Each instruction takes a certain number of parameters (in this case 1 byte - 00) and so the actual next bytecode instruction is 07 - LOAD_FAST Commented Aug 7 at 17:36
  • 1
    @KellyBundy Please take a look at my comment above. 97 is the instruction RESUME taking one byte argument. The list of bytecode instructions can be acquired by iterating over dis.opname Commented Aug 7 at 17:38
  • @user207421 This used to be the case, but according to the docs, not anymore: docs.python.org/3/library/…. Commented Aug 7 at 17:39

1 Answer 1

1

You contrast the documentation [https://docs.python.org/3/library/dis.html] with what you see in the output from dis.dis(example_function) in your example code:

import dis
import sys


def example_function(x):
    if x > 0:
        return x + 1
    return 0


print(sys.version)
print(example_function.__code__.co_code.hex(' '))
dis.dis(example_function)

Running this on current Python 3.12.9 (you didn't specify a version - by from the opcodes it looks like you might be on 3.12), the output is:

3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:49:16) [MSC v.1929 64 bit (AMD64)]
97 00 7c 00 64 01 6b 44 00 00 72 05 7c 00 64 02 7a 00 00 00 53 00 79 01
  5           0 RESUME                   0

  6           2 LOAD_FAST                0 (x)
              4 LOAD_CONST               1 (0)
              6 COMPARE_OP              68 (>)
             10 POP_JUMP_IF_FALSE        5 (to 22)

  7          12 LOAD_FAST                0 (x)
             14 LOAD_CONST               2 (1)
             16 BINARY_OP                0 (+)
             20 RETURN_VALUE

  8     >>   22 RETURN_CONST             1 (0)

You can tell that the hex values 97, 7c, 64 etc. are the opcodes, by looking them up:

print(dis.opname[0x97])
print(dis.opname[0x7C])
print(dis.opname[0x64])

Output:

RESUME
LOAD_FAST
LOAD_CONST

So it seems clear that the first byte in each pair of bytes from the bytecode is the opcode, and the second byte is the argument - in line with the documentation. The second value will be 0x00 for instructions that accept no argument.

You state "the instruction list defined in dis.opname does have a few instructions longer than 1 byte" - but it's unclear where you're getting this. The value of dis.opname is a list of 267 names (in 3.13). What may be confusing is that there's more than 256 values, so what are the extra 11 or more?

These additional opcode names are for pseudo-instructions. These don't translate directly to a single executable opcode in the bytecode, but they are used by the compiler and removed or replaced before bytecode is generated. Have a look at https://github.com/python/cpython/blob/main/Python/flowgraph.c for an example of their use in a Python compiler.

You won't see these in the output of dis, and they obviously won't show up in the bytecode - by then, the compiler has removed or replaced them with usable opcodes.

You commented "Please take a look at my comment above. 97 is the instruction RESUME taking one byte argument." Your confusion seems to stem from the fact that Python only uses single byte opcodes, but still says it uses 2-byte instructions. The instructions are 2-byte, they need the argument to make sense, but they are always opcode+argument pairs - it's really that simple.

You also commented "How would the instructions exceeding 255 look like in the byte format (some_func.code.co_code)? Not all available byte values are used for the opcodes, so I wonder why it was choosen to exceed 255 then." - you won’t see pseudo-opcodes in some_func.__code__.co_code because by the time a code object is created, the compiler has already lowered them into real single-byte opcodes (0–255) and/or encoded their effect in the exception table. They exist only in an intermediate list of instructions the compiler works with before finalizing bytecode. CPython uses them because compilation happens in stages — first a higher-level, more flexible form (which can use numbers >255) is built, then it’s assembled into the compact, fixed-width bytecode format.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the fix @kellybundy - you're right I switched versions when I noted the difference with OP and introduced the mistake
Great answer, thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.