In this post, I'm going to present how to debug Python libraries and its C module implementation for CPython interpreter.
The libraries (e.g., asyncio
, json
in the Lib
directory in CPython project) act as the bridge between users and underlying C modules.
Hence when it comes to debugging CPython interpreter, we use the built-in debugger pdb
to trace the libraries in Python world and gdb
for C world.
Debug Python Lib with PDB breakpoint()
You can simply insert breakpoint()
anywhere in the Python code, and type s
(or step
) to step into the library function implementation.
import asyncio
async def hello_world():
print("hello...")
breakpoint()
await asyncio.sleep(1)
print("world")
asyncio.run(hello_world())
The above script will stop at the breakpoint()
, and you can interactively trace the next execution.
Cross the boundary
Some Python library function calls the actual implementation from the C module indeed.
Here we use the asyncio
as the example to describe how Python and C are bridged.
asyncio.events.get_event_loop()
is used to obtain the event loop (if exists) of current thread.
def get_running_loop():
"""Return the running event loop. Raise a RuntimeError if there is none.
This function is thread-specific.
"""
# NOTE: this function is implemented in C (see _asynciomodule.c)
loop = _get_running_loop()
if loop is None:
raise RuntimeError('no running event loop')
return loop
And the implementation is
static PyObject *
get_event_loop(asyncio_state *state)
{
PyObject *loop;
PyObject *policy;
_PyThreadStateImpl *ts = (_PyThreadStateImpl *)_PyThreadState_GET();
loop = Py_XNewRef(ts->asyncio_running_loop);
if (loop != NULL) {
return loop;
}
policy = PyObject_CallNoArgs(state->asyncio_get_event_loop_policy);
if (policy == NULL) {
return NULL;
}
loop = PyObject_CallMethodNoArgs(policy, &_Py_ID(get_event_loop));
Py_DECREF(policy);
return loop;
}
which returns the event loop from the internal per thread state.
So how Python function get_event_loop
can be bind (and forwarded) into the C version?
If you look at the last few lines of _asynciomodule.c
, you can find the following declaration:
static struct PyModuleDef _asynciomodule = {
.m_base = PyModuleDef_HEAD_INIT,
.m_name = "_asyncio",
.m_doc = module_doc,
.m_size = sizeof(asyncio_state),
.m_methods = asyncio_methods,
.m_slots = module_slots,
.m_traverse = module_traverse,
.m_clear = module_clear,
.m_free = (freefunc)module_free,
};
PyMODINIT_FUNC
PyInit__asyncio(void)
{
return PyModuleDef_Init(&_asynciomodule);
}
where the module is defined with name, size, and public methods.
When asyncio
is imported in Python, the PyInit__asyncio
will be invoked (see importlib
for more details, we may cover this later).
To declare such a lib-module bridge, you should put a line in Modules/Setup
(or Modules/Setup.stdlib.in
for built-in modules).
C module with GDB
We'll harness GDB to trace the execution path in C module implementation.
It's better to build the interpreter with debug information with --with-pydebug
option set when configuring.
Python extension for GDB
When you build CPython from source, a GDB configuration extension file named python-gdb.py
will be generated in the build directory as well.
Following the GDB helper guide, you should disable GDB security protection by setting set auto-load safe-path /
into ~/.gdbinit
or ~/.config/gdb/gdbinit
so that GDB can automatically load the helper script under build directory.
Set breakpoint
To see the frame stack when importing the asyncio
module, we run the above simple script with GDB: gdb --args ./python test.py
(here the python
executable is produced from debug build from source code).
Then set the breakpoint via b PyInit__asyncio
(you may need to allow breakpoint pending on unloaded shared libraries, or append
set breakpoint pending on
in your gdbinit
files.
With the help of python-gdb.py
, you can inspect the Python frame stack via py-bt
:
(gdb) py-bt
Traceback (most recent call first):
<built-in method create_dynamic of module object at remote 0x7ffff7988890>
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 1046, in create_module
File "<frozen importlib._bootstrap>", line 813, in module_from_spec
File "<frozen importlib._bootstrap>", line 921, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1330, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1359, in _find_and_load
File "/mnt/data/home/amd/repos/cpython/Lib/asyncio/events.py", line 839, in <module>
from _asyncio import (_get_running_loop, _set_running_loop,
<built-in method exec of module object at remote 0x7ffff796d490>
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 752, in exec_module
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1330, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1359, in _find_and_load
<built-in method __import__ of module object at remote 0x7ffff796d490>
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1414, in _handle_fromlist
File "/mnt/data/home/amd/repos/cpython/Lib/asyncio/base_events.py", line 40, in <module>
from . import events
<built-in method exec of module object at remote 0x7ffff796d490>
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 752, in exec_module
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1330, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1359, in _find_and_load
File "/mnt/data/home/amd/repos/cpython/Lib/asyncio/__init__.py", line 8, in <module>
from .base_events import *
<built-in method exec of module object at remote 0x7ffff796d490>
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 752, in exec_module
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1330, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1359, in _find_and_load
File "/mnt/data/home/amd/repos/asyncio-learning/examples/run.py", line 1, in <module>
import asyncio
hence we can obtain both the C module frame and Python code frame.
Print PyObject
PyObject
is the "first-citizen" of the C world of Python, from whom every objects are inherited.
It is defined as:
#ifndef Py_GIL_DISABLED
struct _object {
#if (defined(__GNUC__) || defined(__clang__)) \
&& !(defined __STDC_VERSION__ && __STDC_VERSION__ >= 201112L)
// On C99 and older, anonymous union is a GCC and clang extension
__extension__
#endif
#ifdef _MSC_VER
// Ignore MSC warning C4201: "nonstandard extension used:
// nameless struct/union"
__pragma(warning(push))
__pragma(warning(disable: 4201))
#endif
union {
Py_ssize_t ob_refcnt;
#if SIZEOF_VOID_P > 4
PY_UINT32_T ob_refcnt_split[2];
#endif
};
#ifdef _MSC_VER
__pragma(warning(pop))
#endif
PyTypeObject *ob_type;
};
which contains reference count and type information ob_type
.
For instance, if one object is:
(gdb) p *module
$5 = {{ob_refcnt = 23, ob_refcnt_split = {23, 0}}, ob_type = 0x555555bc85c0 <PyModule_Type>}
we can see this object is a PyModuleObject
since its ob_type
is PyModule_Type
.
Thereafter we can print its concrete fields with type casting:
(gdb) p *(PyModuleObject*)module
$6 = {ob_base = {{ob_refcnt = 23, ob_refcnt_split = {23, 0}}, ob_type = 0x555555bc85c0 <PyModule_Type>},
...
Appendix
configure
commands used in different developement topics, note it's recommended to run configure
out-of-tree (i.e., in a separated directory rather than CPython source tree).
# JIT
../configure --with-pydebug --enable-test-modules --enable-experimental-jit=yes
# FT
../configure --with-pydebug --enable-test-modules --disable-gil --with-thread-sanitizer
The debug information helps to locate where the error comes, and TSAN ought to figure out potential hidden bugs.