sys.monitoring
is a utility for low-overhead program trace interface, which can be used in profiling, debugging and coverage tools.
It's proposed in PEP 669 and officially introduced in Python 3.12.
It's closely related to the specialized adaptive bytecode interpreter (see PEP 659).
In this post, I'm trying to dive into how sys.monitoring
mechanism works during bytecode execution.
Source code for sys
Note sys
is a built-in module as a part of CPython interpreter and is purely implemented via C, so there is no Python code for this module.
For monitoring
, its source code is available at Python/instrumentation.c
where you can find the exported public methods:
static PyMethodDef methods[] = {
MONITORING_USE_TOOL_ID_METHODDEF
MONITORING_CLEAR_TOOL_ID_METHODDEF
MONITORING_FREE_TOOL_ID_METHODDEF
MONITORING_GET_TOOL_METHODDEF
MONITORING_REGISTER_CALLBACK_METHODDEF
MONITORING_GET_EVENTS_METHODDEF
MONITORING_SET_EVENTS_METHODDEF
MONITORING_GET_LOCAL_EVENTS_METHODDEF
MONITORING_SET_LOCAL_EVENTS_METHODDEF
MONITORING_RESTART_EVENTS_METHODDEF
MONITORING__ALL_EVENTS_METHODDEF
{NULL, NULL} // sentinel
};
static struct PyModuleDef monitoring_module = {
PyModuleDef_HEAD_INIT,
.m_name = "sys.monitoring",
.m_size = -1, /* multiple "initialization" just copies the module dict. */
.m_methods = methods,
};
Python code object
CPython code object means the underlying structure of any executable code snippets (e.g, a function), where some key fields are declared:
co_name
: name of co, e.g., function nameco_argcount
,co_posonlyargcount
andco_kwonlyargcount
: the number of different types of argumentsco_stacksize
: the size of VM stack used to evaluate this co
You can print these fields within Python by the __code__
attribute, e.g., func.__code__.co_name
.
There is a video describing the code object in Bilibili, check it out if you want to know more detail. And this article details how the bytecode is evaluated in the stack-based CPython VM.
Code object instrumentation
This section decipts the process of calling sys.monitoring.set_events(tool_id: int, event_set: int, /)
, which binds global events to the register tool_id
.
Before we go through the details, let's keep some primary concepts and data structures in mind.
The events are registered to per-interpreter state PyInterpreterState
, which contains a monitor
field:
typedef struct _Py_GlobalMonitors {
uint8_t tools[_PY_MONITORING_UNGROUPED_EVENTS];
} _Py_GlobalMonitors;
which can be regarded as a two-dimension bit table, where each bit in uint8_t
is used to track whether an event is enabled respective to a tool.
Besides, there is a field named instrumentation version in interpreter ceval
state:
struct _ceval_state {
/* This variable holds the global instrumentation version. When a thread is
running, this value is overlaid onto PyThreadState.eval_breaker so that
changes in the instrumentation version will trigger the eval breaker. */
uintptr_t instrumentation_version;
int recursion_limit;
struct _gil_runtime_state *gil;
int own_gil;
struct _pending_calls pending;
};
The version increases each time events are enabled (i.e., set_events
is invoked), so the instrumentation codes are updated when outdated.
The overall instrumentation procedure is:
int
static int
instrument_all_executing_code_objects(PyInterpreterState *interp) {
ASSERT_WORLD_STOPPED();
_PyRuntimeState *runtime = &_PyRuntime;
HEAD_LOCK(runtime);
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
HEAD_UNLOCK(runtime);
while (ts) {
_PyInterpreterFrame *frame = ts->current_frame;
while (frame) {
if (frame->owner != FRAME_OWNED_BY_CSTACK) {
if (instrument_lock_held(_PyFrame_GetCode(frame), interp)) {
return -1;
}
}
frame = frame->previous;
}
HEAD_LOCK(runtime);
ts = PyThreadState_Next(ts);
HEAD_UNLOCK(runtime);
}
return 0;
}
As we can see, CPython unwinds the frame from the thread stack and instruments it by allocating the monitoring data:
typedef struct {
/* Monitoring specific to this code object */
_Py_LocalMonitors local_monitors;
/* Monitoring that is active on this code object */
_Py_LocalMonitors active_monitors;
/* The tools that are to be notified for events for the matching code unit */
uint8_t *tools;
/* The version of tools when they instrument the code */
uintptr_t tool_versions[_PY_MONITORING_TOOL_IDS];
/* Information to support line events */
_PyCoLineInstrumentationData *lines;
/* The tools that are to be notified for line events for the matching code unit */
uint8_t *line_tools;
/* Information to support instruction events */
/* The underlying instructions, which can themselves be instrumented */
uint8_t *per_instruction_opcodes;
/* The tools that are to be notified for instruction events for the matching code unit */
uint8_t *per_instruction_tools;
} _PyCoMonitoringData;
Then in the force_instrument_lock_held
function, CPython will calculate current local events from both the interpreter monitored events and local code object level events, and apply the final active events to each instruction in current CO with specialization.
The core logic of events calculation is demonstrated above, where the two disjoint event sets are calculated:
new_events
: events active both in code object and interpreterremoved_events
: events active in code object but not enabled in interpreter level
After that, CPython iterates all instructions (bytecodes) in current code object, and check whether the corresponding event for each instruction is active. If so, instrumentation will be conducted for that instruction:
static void
instrument(PyCodeObject *code, int i)
{
_Py_CODEUNIT *instr = &_PyCode_CODE(code)[i];
uint8_t *opcode_ptr = &instr->op.code;
int opcode =*opcode_ptr;
/*
Here, skip special process for LINE and INSTRUCTION events
*/
CHECK(opcode != 0);
if (!is_instrumented(opcode)) {
int deopt = _PyOpcode_Deopt[opcode];
int instrumented = INSTRUMENTED_OPCODES[deopt];
assert(instrumented);
FT_ATOMIC_STORE_UINT8_RELAXED(*opcode_ptr, instrumented);
if (_PyOpcode_Caches[deopt]) {
FT_ATOMIC_STORE_UINT16_RELAXED(instr[1].counter.value_and_backoff,
adaptive_counter_warmup().value_and_backoff);
instr[1].counter = adaptive_counter_warmup();
}
}
}
It will replace the original instruction with specialized instruction (e.g., CALL -> INSTRUMENTED_CALL
).
That's say, when user calls sys.set_events()
, all related instructions in current Python frames will be specialized into the instrumented version for later callback invocations.
Register callback
To invoke a callback after a event is encountered, users must register the callback via sys.monitoring.register_callback()
.
Note that the callback signature must match with the event ID, otherwise Python runtime error will be raised.
For CPython interpreter state PyInterpreterState
, there is a field used to track the callback of each tool for every event:
struct _is { // alias of PyInterpreterState
// Omitted
PyObject *monitoring_callables[PY_MONITORING_TOOL_IDS][_PY_MONITORING_EVENTS];
// ...
};
So what the register process does is just put the new callback function into the table and return the original one:
PyObject *
_PyMonitoring_RegisterCallback(int tool_id, int event_id, PyObject *obj)
{
PyInterpreterState *is = _PyInterpreterState_GET();
assert(0 <= tool_id && tool_id < PY_MONITORING_TOOL_IDS);
assert(0 <= event_id && event_id < _PY_MONITORING_EVENTS);
PyObject *callback = _Py_atomic_exchange_ptr(&is->monitoring_callables[tool_id][event_id],
Py_XNewRef(obj));
return callback;
}
How callbacks are invoked
Until now we have done:
- use
sys.monitoring.set_events()
to tell CPython enable monitoring on some events - call
sys.monitoring.register_callback()
to bind the user-provided callback function to the specific event
During execution in the evaluation, the INSTRUMENTED_CALL
bytecode will invoke the _Py_call_instrumentation_2args
as the trampoline to prepare the callback function arguments (e.g., CALL
event requires the callback to have four arguments: CodeObject
, instruction_offset
, callable
and arg0
).
Then call_one_instrument
checks whether the callback has been registered in the interpreter state, and then executes the vectorcall semantic via _PyObject_VectorcallTstate
.
Summary
In this post, we elaborate how events and callbacks are registered using the sys.monitoring
module.
To fully understand the process, we should get ourselves more familiar with CPython frame evaluation, code object and calling conventions.