Usage#

Python API#

from_code#

The main entrypoint to our API is the CodeData object. You can create it from any Python CodeType:

# Load rich first for prettier output
from rich import pretty
pretty.install()
from code_data import CodeData

def fn(a, b):
    return a + b

cd = CodeData.from_code(fn.__code__)
cd
CodeData(
    (
        (
            Instruction('LOAD_FAST', Varname('a'), line_number=7),
            Instruction('LOAD_FAST', Varname('b'), line_number=7),
            Instruction('BINARY_ADD', line_number=7),
            Instruction('RETURN_VALUE', line_number=7)
        ),
    ),
    filename='/tmp/ipykernel_701/3063237183.py',
    first_line_number=6,
    name='fn',
    stacksize=2,
    type=Function(Args(positional_or_keyword=('a', 'b'))),
    _additional_args=(Constant(None, _index_override=0),)
)

Instead of using Python’s built in code object, or the dis module, it reduces the amoutn of information to only that which is needed to recreate the code object. So all information about how it happens to be stored on disk, the bytecode offsets for example of each instruction, is ommited, making it simpler to use.

normalize#

We are also able to “normalize” the code object, removing pieces of it that are unused. For example, if you have dead code, Python will still include the constants that are present in it, even though there is no way they can be accessed:

def fn():
    if False:
        x = 20
    x = 1


cd = CodeData.from_code(fn.__code__)
cd
CodeData(
    (
        (
            Instruction('NOP', line_number=2),
            Instruction(
                'LOAD_CONST',
                Constant(1, _index_override=3),
                line_number=4
            ),
            Instruction(
                'STORE_FAST',
                Varname('x', _index_override=0),
                line_number=4
            ),
            Instruction(
                'LOAD_CONST',
                Constant(None, _index_override=0),
                line_number=4
            ),
            Instruction('RETURN_VALUE', line_number=4)
        ),
    ),
    filename='/tmp/ipykernel_701/2121495508.py',
    first_line_number=1,
    name='fn',
    stacksize=1,
    type=Function(),
    _additional_args=(
        Constant(False, _index_override=1),
        Constant(20, _index_override=2)
    )
)
cd.normalize()
CodeData(
    (
        (
            Instruction('NOP', line_number=2),
            Instruction('LOAD_CONST', Constant(1), line_number=4),
            Instruction('STORE_FAST', Varname('x'), line_number=4),
            Instruction('LOAD_CONST', Constant(None), line_number=4),
            Instruction('RETURN_VALUE', line_number=4)
        ),
    ),
    filename='/tmp/ipykernel_701/2121495508.py',
    first_line_number=1,
    name='fn',
    stacksize=1,
    type=Function()
)

JSON Support#

Since the code object is now a simple data structure, we can serialize it to and from JSON. This provides a nice option if you want to analyze Python bytecode in a different language or save it on disk:

code_json = cd.to_json_data()
assert CodeData.from_json_data(code_json) == cd

code_json
{
    'blocks': [
        [
            {'name': 'NOP', 'line_number': 2},
            {
                'name': 'LOAD_CONST',
                'arg': {'constant': 1, '_index_override': 3},
                'line_number': 4
            },
            {
                'name': 'STORE_FAST',
                'arg': {'varname': 'x', '_index_override': 0},
                'line_number': 4
            },
            {
                'name': 'LOAD_CONST',
                'arg': {'constant': None, '_index_override': 0},
                'line_number': 4
            },
            {'name': 'RETURN_VALUE', 'line_number': 4}
        ]
    ],
    'filename': '/tmp/ipykernel_701/2121495508.py',
    'first_line_number': 1,
    'name': 'fn',
    'stacksize': 1,
    'type': {},
    '_additional_args': [
        {'constant': False, '_index_override': 1},
        {'constant': 20, '_index_override': 2}
    ]
}

Command Line#

We provide a CLI command python-code-data which is useful for debugging or introspecting code objects from the command line.

It contains many of the same flags to load Python code as the default Python CLI, including from a string (-c), from a module (-m), or from a path (<file name>). It also includes a way to load a string from Python code to eval it first, which is useful for generating test cases on the CLI of program strings.

! python-code-data -h
usage: python-code-data [-h] [-c cmd] [-e eval] [-m mod] [--dis] [--dis-after]
                        [--source] [--no-normalize] [--json]
                        [file]

Inspect Python code objects.

positional arguments:
  file            path to Python program

options:
  -h, --help      show this help message and exit
  -c cmd          program passed in as string
  -e eval         string evalled to make program
  -m mod          python library
  --dis           print Python's dis analysis
  --dis-after     print Python's dis analysis after round tripping to code-
                  data, for testing
  --source        print the source code
  --no-normalize  don't normalize code data before printing
  --json          Print the JSON represenation of the code data as well
! python-code-data -c 'x if y else z'
CodeData(
    (
        (
            Instruction('LOAD_NAME', Name('y'), line_number=1),
            Instruction('POP_JUMP_IF_FALSE', Jump(1), line_number=1),
            Instruction('LOAD_NAME', Name('x'), line_number=1),
            Instruction('POP_TOP', line_number=1),
            Instruction('LOAD_CONST', Constant(None), line_number=1),
            Instruction('RETURN_VALUE', line_number=1)
        ),
        (
            Instruction('LOAD_NAME', Name('z'), line_number=1),
            Instruction('POP_TOP', line_number=1),
            Instruction('LOAD_CONST', Constant(None), line_number=1),
            Instruction('RETURN_VALUE', line_number=1)
        )
    ),
    filename='<string>',
    first_line_number=1,
    name='<module>',
    stacksize=1
)