Tagged pointer is a common performance optimization techniques used by many dynamic programming langauge based on VM (e.g., Ruby). It's not a new concept, as it has been used by Apple when they introduced iPhone 5S with 64-bit Apple A7 processor in 2013. But Python, maybe the most popular dynamic programming language, leverages this mechanism in 2024.
In a machine with 64-bit ISA, the pointer (void *
) has 64-bits long, and there are memory alignment restrictions in most architectures.
And, the C standard ensures pointers allocated by malloc
follow the memory alignment restrictions.
In 64-bit systems, the address will be aligned by 8-bytes long, which means the lower 3 bits of a allocated pointer will always be 000
.
<- High Low ->
0bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx000
Therefore, we can harness these used bits to store some information (that's called tagging). Meanwhile, we can exploit them to indicate whether the 64-bit long field contains raw or boxed data. For more details, please refer to this article for concrete explanation.
StackRef
: an alias of PyObject *
In CPython, everything is a PyObject
, and the interpreter pushes and pops them into/from the evaluation stack when executing bytecodes.
For small objects, especially, int
, PVM dereferences them so frequently, it hurts performance so much.
So CPython uses tagged pointers to represent integers whose values can be fit into the left bits after retaining the tag bits.
#define Py_INT_TAG 3
#define Py_TAG_INVALID 2
#define Py_TAG_REFCNT 1
#define Py_TAG_BITS 3
Currently (CPython 3.14), there are 3 bits reserved to indicate the metadata in a tagged pointer.
The bits are encapsulated in a C struct named _PyStackRef
:
typedef union _PyStackRef {
#if !defined(Py_GIL_DISABLED) && defined(Py_STACKREF_DEBUG)
uint64_t index;
#else
uintptr_t bits;
#endif
} _PyStackRef;
If a PyStackRef
represents an integer (i.e., PyLongObject
in Python), then the last 3 (Py_TAG_BITS
) bits in bits
will exactly be 0b011
(Py_INT_TAG
).
Otherwise, it is an alias of PyObject *
, whose bits
is set to the 64-bit long ptr
if the corresponding object is not immortal (i.e., None
, True
and False
).
For immortal objects, the bits
is set as:
&_Py_FalseStruct | Py_TAG_REFCNT // immortal bits
Deferred reference counting with StackRef
The vast majority of incref/decref-s happen in the interpreter and evaluation stack 1. One can eliminate this part of runtime overhead by not updating the RC for "long-lived" objects like code objects, top-level functions (not closure) and modules. And the developers determined to combine deferred RC and tagged pointer together to have a unified system to identift objects that can be deferred RCed 23.
In FT build (--disable-gil
), you can see a macro:
#define Py_TAG_DEFERRED Py_TAG_REFCNT
#define PyStackRef_IsDeferred(ref) (((ref).bits & Py_TAG_BITS) == Py_TAG_DEFERRED)
On the contrary, objects that cannot be deferred RCed are still represented by PyStackRef
with bits set as original values.