How does sys.monitoring work?

sys.monitoring is a utility for low-overhead program trace interface, which can be used in profiling, debugging and coverage tools. It's proposed in PEP 669 and officially introduced in Python 3.12. It's closely related to the specialized adaptive bytecode interpreter (see PEP 659). In this post, I'm trying to dive into how sys.monitoring mechanism works during bytecode execution.

Source code for `sys`

Note sys is a built-in module as a part of CPython interpreter and is purely implemented via C, so there is no Python code for this module. For monitoring, its source code is available at Python/instrumentation.c where you can find the exported public methods:

Python/instrumentation.c

static PyMethodDef methods[] = {
    MONITORING_USE_TOOL_ID_METHODDEF
    MONITORING_CLEAR_TOOL_ID_METHODDEF
    MONITORING_FREE_TOOL_ID_METHODDEF
    MONITORING_GET_TOOL_METHODDEF
    MONITORING_REGISTER_CALLBACK_METHODDEF
    MONITORING_GET_EVENTS_METHODDEF
    MONITORING_SET_EVENTS_METHODDEF
    MONITORING_GET_LOCAL_EVENTS_METHODDEF
    MONITORING_SET_LOCAL_EVENTS_METHODDEF
    MONITORING_RESTART_EVENTS_METHODDEF
    MONITORING__ALL_EVENTS_METHODDEF
    {NULL, NULL}  // sentinel
};

static struct PyModuleDef monitoring_module = {
    PyModuleDef_HEAD_INIT,
    .m_name = "sys.monitoring",
    .m_size = -1, /* multiple "initialization" just copies the module dict. */
    .m_methods = methods,
};

Python code object

CPython code object means the underlying structure of any executable code snippets (e.g, a function), where some key fields are declared:

co_name: name of co, e.g., function name
co_argcount, co_posonlyargcount and co_kwonlyargcount: the number of different types of arguments
co_stacksize: the size of VM stack used to evaluate this co

You can print these fields within Python by the __code__ attribute, e.g., func.__code__.co_name.

There is a video describing the code object in Bilibili, check it out if you want to know more detail. And this article details how the bytecode is evaluated in the stack-based CPython VM.

Code object instrumentation

This section decipts the process of calling sys.monitoring.set_events(tool_id: int, event_set: int, /), which binds global events to the register tool_id. Before we go through the details, let's keep some primary concepts and data structures in mind. The events are registered to per-interpreter state PyInterpreterState, which contains a monitor field:

Include/cpython/code.h

typedef struct _Py_GlobalMonitors {
    uint8_t tools[_PY_MONITORING_UNGROUPED_EVENTS];
} _Py_GlobalMonitors;

which can be regarded as a two-dimension bit table, where each bit in uint8_t is used to track whether an event is enabled respective to a tool.

Besides, there is a field named instrumentation version in interpreter ceval state:

Python/instrumentation.c

struct _ceval_state {
    /* This variable holds the global instrumentation version. When a thread is
       running, this value is overlaid onto PyThreadState.eval_breaker so that
       changes in the instrumentation version will trigger the eval breaker. */
    uintptr_t instrumentation_version;
    int recursion_limit;
    struct _gil_runtime_state *gil;
    int own_gil;
    struct _pending_calls pending;
};

The version increases each time events are enabled (i.e., set_events is invoked), so the instrumentation codes are updated when outdated.

The overall instrumentation procedure is:

Python/instrumentation.c

int
static int
instrument_all_executing_code_objects(PyInterpreterState *interp) {
    ASSERT_WORLD_STOPPED();

    _PyRuntimeState *runtime = &_PyRuntime;
    HEAD_LOCK(runtime);
    PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
    HEAD_UNLOCK(runtime);
    while (ts) {
        _PyInterpreterFrame *frame = ts->current_frame;
        while (frame) {
            if (frame->owner != FRAME_OWNED_BY_CSTACK) {
                if (instrument_lock_held(_PyFrame_GetCode(frame), interp)) {
                    return -1;
                }
            }
            frame = frame->previous;
        }
        HEAD_LOCK(runtime);
        ts = PyThreadState_Next(ts);
        HEAD_UNLOCK(runtime);
    }
    return 0;
}

As we can see, CPython unwinds the frame from the thread stack and instruments it by allocating the monitoring data:

Include/cpython/code.h

typedef struct {
    /* Monitoring specific to this code object */
    _Py_LocalMonitors local_monitors;
    /* Monitoring that is active on this code object */
    _Py_LocalMonitors active_monitors;
    /* The tools that are to be notified for events for the matching code unit */
    uint8_t *tools;
    /* The version of tools when they instrument the code */
    uintptr_t tool_versions[_PY_MONITORING_TOOL_IDS];
    /* Information to support line events */
    _PyCoLineInstrumentationData *lines;
    /* The tools that are to be notified for line events for the matching code unit */
    uint8_t *line_tools;
    /* Information to support instruction events */
    /* The underlying instructions, which can themselves be instrumented */
    uint8_t *per_instruction_opcodes;
    /* The tools that are to be notified for instruction events for the matching code unit */
    uint8_t *per_instruction_tools;
} _PyCoMonitoringData;

Then in the force_instrument_lock_held function, CPython will calculate current local events from both the interpreter monitored events and local code object level events, and apply the final active events to each instruction in current CO with specialization.

The core logic of events calculation is demonstrated above, where the two disjoint event sets are calculated:

new_events: events active both in code object and interpreter
removed_events: events active in code object but not enabled in interpreter level

After that, CPython iterates all instructions (bytecodes) in current code object, and check whether the corresponding event for each instruction is active. If so, instrumentation will be conducted for that instruction:

Python/instrumentation.c

static void
instrument(PyCodeObject *code, int i)
{
    _Py_CODEUNIT *instr = &_PyCode_CODE(code)[i];
    uint8_t *opcode_ptr = &instr->op.code;
    int opcode =*opcode_ptr;

    /*
    Here, skip special process for LINE and INSTRUCTION events
    */

    CHECK(opcode != 0);
    if (!is_instrumented(opcode)) {
        int deopt = _PyOpcode_Deopt[opcode];
        int instrumented = INSTRUMENTED_OPCODES[deopt];
        assert(instrumented);
        FT_ATOMIC_STORE_UINT8_RELAXED(*opcode_ptr, instrumented);
        if (_PyOpcode_Caches[deopt]) {
            FT_ATOMIC_STORE_UINT16_RELAXED(instr[1].counter.value_and_backoff,
                                           adaptive_counter_warmup().value_and_backoff);
            instr[1].counter = adaptive_counter_warmup();
        }
    }
}

It will replace the original instruction with specialized instruction (e.g., CALL -> INSTRUMENTED_CALL).

That's say, when user calls sys.set_events(), all related instructions in current Python frames will be specialized into the instrumented version for later callback invocations.

Register callback

To invoke a callback after a event is encountered, users must register the callback via sys.monitoring.register_callback(). Note that the callback signature must match with the event ID, otherwise Python runtime error will be raised.

For CPython interpreter state PyInterpreterState, there is a field used to track the callback of each tool for every event:

Include/internal/pycore_interp.h

struct _is {  // alias of PyInterpreterState
    // Omitted
    PyObject *monitoring_callables[PY_MONITORING_TOOL_IDS][_PY_MONITORING_EVENTS];
    // ...
};

So what the register process does is just put the new callback function into the table and return the original one:

Python/instrumentation.c

PyObject *
_PyMonitoring_RegisterCallback(int tool_id, int event_id, PyObject *obj)
{
    PyInterpreterState *is = _PyInterpreterState_GET();
    assert(0 <= tool_id && tool_id < PY_MONITORING_TOOL_IDS);
    assert(0 <= event_id && event_id < _PY_MONITORING_EVENTS);
    PyObject *callback = _Py_atomic_exchange_ptr(&is->monitoring_callables[tool_id][event_id],
                                                 Py_XNewRef(obj));

    return callback;
}

How callbacks are invoked

Until now we have done:

use sys.monitoring.set_events() to tell CPython enable monitoring on some events
call sys.monitoring.register_callback() to bind the user-provided callback function to the specific event

During execution in the evaluation, the INSTRUMENTED_CALL bytecode will invoke the _Py_call_instrumentation_2args as the trampoline to prepare the callback function arguments (e.g., CALL event requires the callback to have four arguments: CodeObject, instruction_offset, callable and arg0). Then call_one_instrument checks whether the callback has been registered in the interpreter state, and then executes the vectorcall semantic via _PyObject_VectorcallTstate.

Summary

In this post, we elaborate how events and callbacks are registered using the sys.monitoring module. To fully understand the process, we should get ourselves more familiar with CPython frame evaluation, code object and calling conventions.

How does sys.monitoring work?

Source code for sys

Python code object

Code object instrumentation

Register callback

How callbacks are invoked

Summary

Source code for `sys`