Tagged pointers in dynamic langauge

June 11, 2025

Tagged pointer is a common performance optimization techniques used by many dynamic programming langauge based on VM (e.g., Ruby). It's not a new concept, as it has been used by Apple when they introduced iPhone 5S with 64-bit Apple A7 processor in 2013. But Python, maybe the most popular dynamic programming language, leverages this mechanism in 2024.

In a machine with 64-bit ISA, the pointer (void *) has 64-bits long, and there are memory alignment restrictions in most architectures. And, the C standard ensures pointers allocated by malloc follow the memory alignment restrictions. In 64-bit systems, the address will be aligned by 8-bytes long, which means the lower 3 bits of a allocated pointer will always be 000.

<- High                                                     Low ->
0bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx000

Therefore, we can harness these used bits to store some information (that's called tagging). Meanwhile, we can exploit them to indicate whether the 64-bit long field contains raw or boxed data. For more details, please refer to this article for concrete explanation.

StackRef: an alias of PyObject *

In CPython, everything is a PyObject, and the interpreter pushes and pops them into/from the evaluation stack when executing bytecodes. For small objects, especially, int, PVM dereferences them so frequently, it hurts performance so much. So CPython uses tagged pointers to represent integers whose values can be fit into the left bits after retaining the tag bits.

#define Py_INT_TAG 3
#define Py_TAG_INVALID 2
#define Py_TAG_REFCNT 1
#define Py_TAG_BITS 3

Currently (CPython 3.14), there are 3 bits reserved to indicate the metadata in a tagged pointer. The bits are encapsulated in a C struct named _PyStackRef:

typedef union _PyStackRef {
#if !defined(Py_GIL_DISABLED) && defined(Py_STACKREF_DEBUG)
    uint64_t index;
#else
    uintptr_t bits;
#endif
} _PyStackRef;

If a PyStackRef represents an integer (i.e., PyLongObject in Python), then the last 3 (Py_TAG_BITS) bits in bits will exactly be 0b011 (Py_INT_TAG). Otherwise, it is an alias of PyObject *, whose bits is set to the 64-bit long ptr if the corresponding object is not immortal (i.e., None, True and False). For immortal objects, the bits is set as:

&_Py_FalseStruct | Py_TAG_REFCNT  // immortal bits

Deferred reference counting with StackRef

The vast majority of incref/decref-s happen in the interpreter and evaluation stack 1. One can eliminate this part of runtime overhead by not updating the RC for "long-lived" objects like code objects, top-level functions (not closure) and modules. And the developers determined to combine deferred RC and tagged pointer together to have a unified system to identift objects that can be deferred RCed 23.

In FT build (--disable-gil), you can see a macro:

#define Py_TAG_DEFERRED Py_TAG_REFCNT
#define PyStackRef_IsDeferred(ref) (((ref).bits & Py_TAG_BITS) == Py_TAG_DEFERRED)

On the contrary, objects that cannot be deferred RCed are still represented by PyStackRef with bits set as original values.

Footnotes

  1. https://github.com/python/cpython/issues/120024

  2. https://github.com/faster-cpython/ideas/issues/678

  3. https://github.com/faster-cpython/ideas/issues/677