Example: Modifying Existing Bytecode#

In this example, we will compile some code, modify the bytecode, and then turn it back into Python code to execute.

We can make a code object from a string using compile:

x = True
source_code = "print(10 + (100 if x else 10))"
code = compile(source_code, "", "exec")
exec(code)
110

If we look at the code object, we can see that it does have the bytecode, but its represented as byte string, which isn’t very helpful:

print(code)
print(code.co_code)
<code object <module> at 0x7f092b9f0500, file "", line 1>
b'e\x00d\x00e\x01r\x06d\x01n\x01d\x00\x17\x00\x83\x01\x01\x00d\x02S\x00'

We could use Python’s built in dis module to introspect the code object. This is helpful to look at it, but won’t let us change it:

import dis
dis.dis(code)
  1           0 LOAD_NAME                0 (print)
              2 LOAD_CONST               0 (10)
              4 LOAD_NAME                1 (x)
              6 POP_JUMP_IF_FALSE        6 (to 12)
              8 LOAD_CONST               1 (100)
             10 JUMP_FORWARD             1 (to 14)
        >>   12 LOAD_CONST               0 (10)
        >>   14 BINARY_ADD
             16 CALL_FUNCTION            1
             18 POP_TOP
             20 LOAD_CONST               2 (None)
             22 RETURN_VALUE

So instead, lets turn it into ✨data✨:

from code_data import CodeData

code_data = CodeData.from_code(code)
code_data
CodeData(blocks=((Instruction(name='LOAD_NAME', arg=Name(name='print', _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=10, _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_NAME', arg=Name(name='x', _index_override=1), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='POP_JUMP_IF_FALSE', arg=Jump(target=1, relative=False), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=100, _index_override=1), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='JUMP_FORWARD', arg=Jump(target=2, relative=True), _n_args_override=None, line_number=1, _line_offsets_override=())), (Instruction(name='LOAD_CONST', arg=Constant(constant=10, _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()),), (Instruction(name='BINARY_ADD', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='CALL_FUNCTION', arg=1, _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='POP_TOP', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=None, _index_override=2), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='RETURN_VALUE', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()))), filename='', first_line_number=1, name='<module>', stacksize=3, type=None, freevars=(), future_annotations=False, _nested=False, _additional_line=None, _additional_args=())

This is still a bit hard to see, so let’s install Rich’s pretty print helper:

from rich import pretty
pretty.install()
code_data
CodeData(
    (
        (
            Instruction(
                'LOAD_NAME',
                Name('print', _index_override=0),
                line_number=1
            ),
            Instruction(
                'LOAD_CONST',
                Constant(10, _index_override=0),
                line_number=1
            ),
            Instruction(
                'LOAD_NAME',
                Name('x', _index_override=1),
                line_number=1
            ),
            Instruction('POP_JUMP_IF_FALSE', Jump(1), line_number=1),
            Instruction(
                'LOAD_CONST',
                Constant(100, _index_override=1),
                line_number=1
            ),
            Instruction('JUMP_FORWARD', Jump(2, relative=True), line_number=1)
        ),
        (
            Instruction(
                'LOAD_CONST',
                Constant(10, _index_override=0),
                line_number=1
            ),
        ),
        (
            Instruction('BINARY_ADD', line_number=1),
            Instruction('CALL_FUNCTION', 1, line_number=1),
            Instruction('POP_TOP', line_number=1),
            Instruction(
                'LOAD_CONST',
                Constant(None, _index_override=2),
                line_number=1
            ),
            Instruction('RETURN_VALUE', line_number=1)
        )
    ),
    filename='',
    first_line_number=1,
    name='<module>',
    stacksize=3
)

That’s better!

We can see now that we have two blocks, each with a list of instructions.

Let’s try to change the additions to subtractions!

from dataclasses import replace

new_code_data = replace(
    code_data,
    blocks=tuple(tuple(
        replace(instruction, name="BINARY_SUBTRACT") if instruction.name == "BINARY_ADD" else instruction
        for instruction in block
    ) for block in code_data.blocks)
)

Now we can turn this back into code and exec it!

new_code = new_code_data.to_code()
exec(new_code)
-90