Example: Modifying Existing Bytecode
Example: Modifying Existing Bytecode#
In this example, we will compile some code, modify the bytecode, and then turn it back into Python code to execute.
We can make a code object from a string using compile
:
x = True
source_code = "print(10 + (100 if x else 10))"
code = compile(source_code, "", "exec")
exec(code)
110
If we look at the code object, we can see that it does have the bytecode, but its represented as byte string, which isn’t very helpful:
print(code)
print(code.co_code)
<code object <module> at 0x7f092b9f0500, file "", line 1>
b'e\x00d\x00e\x01r\x06d\x01n\x01d\x00\x17\x00\x83\x01\x01\x00d\x02S\x00'
We could use Python’s built in dis
module to introspect the code object. This is helpful to look at it, but won’t let us change it:
import dis
dis.dis(code)
1 0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 (10)
4 LOAD_NAME 1 (x)
6 POP_JUMP_IF_FALSE 6 (to 12)
8 LOAD_CONST 1 (100)
10 JUMP_FORWARD 1 (to 14)
>> 12 LOAD_CONST 0 (10)
>> 14 BINARY_ADD
16 CALL_FUNCTION 1
18 POP_TOP
20 LOAD_CONST 2 (None)
22 RETURN_VALUE
So instead, lets turn it into ✨data✨:
from code_data import CodeData
code_data = CodeData.from_code(code)
code_data
CodeData(blocks=((Instruction(name='LOAD_NAME', arg=Name(name='print', _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=10, _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_NAME', arg=Name(name='x', _index_override=1), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='POP_JUMP_IF_FALSE', arg=Jump(target=1, relative=False), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=100, _index_override=1), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='JUMP_FORWARD', arg=Jump(target=2, relative=True), _n_args_override=None, line_number=1, _line_offsets_override=())), (Instruction(name='LOAD_CONST', arg=Constant(constant=10, _index_override=0), _n_args_override=None, line_number=1, _line_offsets_override=()),), (Instruction(name='BINARY_ADD', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='CALL_FUNCTION', arg=1, _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='POP_TOP', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='LOAD_CONST', arg=Constant(constant=None, _index_override=2), _n_args_override=None, line_number=1, _line_offsets_override=()), Instruction(name='RETURN_VALUE', arg=NoArg(_arg=0), _n_args_override=None, line_number=1, _line_offsets_override=()))), filename='', first_line_number=1, name='<module>', stacksize=3, type=None, freevars=(), future_annotations=False, _nested=False, _additional_line=None, _additional_args=())
This is still a bit hard to see, so let’s install Rich’s pretty print helper:
from rich import pretty
pretty.install()
code_data
CodeData( ( ( Instruction( 'LOAD_NAME', Name('print', _index_override=0), line_number=1 ), Instruction( 'LOAD_CONST', Constant(10, _index_override=0), line_number=1 ), Instruction( 'LOAD_NAME', Name('x', _index_override=1), line_number=1 ), Instruction('POP_JUMP_IF_FALSE', Jump(1), line_number=1), Instruction( 'LOAD_CONST', Constant(100, _index_override=1), line_number=1 ), Instruction('JUMP_FORWARD', Jump(2, relative=True), line_number=1) ), ( Instruction( 'LOAD_CONST', Constant(10, _index_override=0), line_number=1 ), ), ( Instruction('BINARY_ADD', line_number=1), Instruction('CALL_FUNCTION', 1, line_number=1), Instruction('POP_TOP', line_number=1), Instruction( 'LOAD_CONST', Constant(None, _index_override=2), line_number=1 ), Instruction('RETURN_VALUE', line_number=1) ) ), filename='', first_line_number=1, name='<module>', stacksize=3 )
That’s better!
We can see now that we have two blocks, each with a list of instructions.
Let’s try to change the additions to subtractions!
from dataclasses import replace
new_code_data = replace(
code_data,
blocks=tuple(tuple(
replace(instruction, name="BINARY_SUBTRACT") if instruction.name == "BINARY_ADD" else instruction
for instruction in block
) for block in code_data.blocks)
)
Now we can turn this back into code and exec it!
new_code = new_code_data.to_code()
exec(new_code)
-90