You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea is that all C API functions get an additional parameter that represents a handle to some part of the VM state.
E.g. the PyObject_IsTrue function has the following signature: int PyObject_IsTrue(PyObject *v).
Compare that to the HPy equivalent: int HPy_IsTrue(HPyContext *ctx, HPy h) which has an additional HPyContext parameter.
Why?
Adding a "context" parameter has a number of important advantages:
It can act as a capability. For example, allowing the full range of API calls normally, but only allowing calls to decref and free into deallocation functions.
It can improve performance. Many API calls need access to VM state, which they have to access through PyInterpreter_Get() or PyThreadState_Get(). These calls require access to TLS which is slow for dynamically linked code, and on Windows.
The new API can use a principled approach to naming, easing porting to a HPy-like API in future.
How?
Adding a new parameter to every C API function is going to be a lot of work. To make it worthwhile we need to see incremental benefit.
The performance benefit mentioned above is that incremental benefit. Allocation and freeing of object represents ~10% of the current runtime. A significant part of that is the indirection caused by needing to TLS and the narrow interface of the allocation deallocation functions. And a context parameter gives us fast access to the allocator data structures and thus fast allocation and deallocation.
Here's a possible sequence for implementing this:
Allocation functions for tuple, list, and other common classes. The context will give cheap access to the underlying freelists
Py_DECREF. As a placeholder, to be used by:
Deallocation functions for common classes. The context will give cheap access to the underlying freelists. If Py_DECREF supports the context, it can pass it through to any dellocation functions, ensuring that freeing collections will not require repeated reads of TLS.
The tricky part
Many C API functions call back into extension provided code. Those also need to support the context parameter. Unfortunately we cannot just add more tp_ slots to type objects, but we need to put them somewhere.
For common classes, we can use table lookup. Instead of deallocator = tp->tp_dealloc we would have deallocator_table[tp->tp_index], which does involve an extra memory read. However, simple reads like tp->tp_index are considerably faster than the TLS read they are replacing.
What will the context be?
It will be defined something like this: typedef uintptr_t PyApiHandle; and should be considered opaque by C extensions.
It will be probably be implemented as something as simple as PyApiHandle handle = ((uintptr_t)interp) + K for the default build and PyApiHandle handle = ((uintptr_t)tstate) + K for the free-threading build. K could be zero, but a non-zero offset will discourage users from casting directly instead of using the API.
The text was updated successfully, but these errors were encountered:
The idea is that all C API functions get an additional parameter that represents a handle to some part of the VM state.
E.g. the
PyObject_IsTrue
function has the following signature:int PyObject_IsTrue(PyObject *v)
.Compare that to the HPy equivalent:
int HPy_IsTrue(HPyContext *ctx, HPy h)
which has an additionalHPyContext
parameter.Why?
Adding a "context" parameter has a number of important advantages:
PyInterpreter_Get()
orPyThreadState_Get()
. These calls require access to TLS which is slow for dynamically linked code, and on Windows.How?
Adding a new parameter to every C API function is going to be a lot of work. To make it worthwhile we need to see incremental benefit.
The performance benefit mentioned above is that incremental benefit. Allocation and freeing of object represents ~10% of the current runtime. A significant part of that is the indirection caused by needing to TLS and the narrow interface of the allocation deallocation functions. And a context parameter gives us fast access to the allocator data structures and thus fast allocation and deallocation.
Here's a possible sequence for implementing this:
tuple
,list
, and other common classes. The context will give cheap access to the underlying freelistsPy_DECREF
. As a placeholder, to be used by:Py_DECREF
supports the context, it can pass it through to any dellocation functions, ensuring that freeing collections will not require repeated reads of TLS.The tricky part
Many C API functions call back into extension provided code. Those also need to support the context parameter. Unfortunately we cannot just add more
tp_
slots to type objects, but we need to put them somewhere.For common classes, we can use table lookup. Instead of
deallocator = tp->tp_dealloc
we would havedeallocator_table[tp->tp_index]
, which does involve an extra memory read. However, simple reads liketp->tp_index
are considerably faster than the TLS read they are replacing.What will the context be?
It will be defined something like this:
typedef uintptr_t PyApiHandle;
and should be considered opaque by C extensions.It will be probably be implemented as something as simple as
PyApiHandle handle = ((uintptr_t)interp) + K
for the default build andPyApiHandle handle = ((uintptr_t)tstate) + K
for the free-threading build.K
could be zero, but a non-zero offset will discourage users from casting directly instead of using the API.The text was updated successfully, but these errors were encountered: