Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass a context/token as an argument to all C API functions. #703

Open
markshannon opened this issue Nov 8, 2024 · 0 comments
Open

Pass a context/token as an argument to all C API functions. #703

markshannon opened this issue Nov 8, 2024 · 0 comments

Comments

@markshannon
Copy link
Member

The idea is that all C API functions get an additional parameter that represents a handle to some part of the VM state.
E.g. the PyObject_IsTrue function has the following signature: int PyObject_IsTrue(PyObject *v).
Compare that to the HPy equivalent: int HPy_IsTrue(HPyContext *ctx, HPy h) which has an additional HPyContext parameter.

Why?

Adding a "context" parameter has a number of important advantages:

  • It can act as a capability. For example, allowing the full range of API calls normally, but only allowing calls to decref and free into deallocation functions.
  • It can improve performance. Many API calls need access to VM state, which they have to access through PyInterpreter_Get() or PyThreadState_Get(). These calls require access to TLS which is slow for dynamically linked code, and on Windows.
  • The new API can use a principled approach to naming, easing porting to a HPy-like API in future.

How?

Adding a new parameter to every C API function is going to be a lot of work. To make it worthwhile we need to see incremental benefit.
The performance benefit mentioned above is that incremental benefit. Allocation and freeing of object represents ~10% of the current runtime. A significant part of that is the indirection caused by needing to TLS and the narrow interface of the allocation deallocation functions. And a context parameter gives us fast access to the allocator data structures and thus fast allocation and deallocation.

Here's a possible sequence for implementing this:

  • Allocation functions for tuple, list, and other common classes. The context will give cheap access to the underlying freelists
  • Py_DECREF. As a placeholder, to be used by:
  • Deallocation functions for common classes. The context will give cheap access to the underlying freelists. If Py_DECREF supports the context, it can pass it through to any dellocation functions, ensuring that freeing collections will not require repeated reads of TLS.

The tricky part

Many C API functions call back into extension provided code. Those also need to support the context parameter. Unfortunately we cannot just add more tp_ slots to type objects, but we need to put them somewhere.
For common classes, we can use table lookup. Instead of deallocator = tp->tp_dealloc we would have deallocator_table[tp->tp_index], which does involve an extra memory read. However, simple reads like tp->tp_index are considerably faster than the TLS read they are replacing.

What will the context be?

It will be defined something like this: typedef uintptr_t PyApiHandle; and should be considered opaque by C extensions.
It will be probably be implemented as something as simple as PyApiHandle handle = ((uintptr_t)interp) + K for the default build and
PyApiHandle handle = ((uintptr_t)tstate) + K for the free-threading build.
K could be zero, but a non-zero offset will discourage users from casting directly instead of using the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant