G3PythonContext class for handling the Python GIL #146

arahlin · 2024-02-28T06:11:58Z

This PR creates a new class that simplifies initialization of python threads, as well as acquiriing / releasing the Python global interpreter lock in various contexts.

Use cases include:

Ensuring that Py_Initialize() is properly called at the beginning of a program that is expected to interact with the python interpreter, and also that Py_Finalize() is called when the program is finished.
Ensuring that the current thread state is saved and the GIL released as necessary, e.g. for IO operations, and then the thread state is restored on completion.
Ensuring that the GIL is acquired for one-off interaction with the python interpreter, and released when complete.

A G3PythonContext object is used throughout the library code for cases 2 and 3. Case 1 is handled by a G3PythonInterpreter object constructed at the main program/thread level. If the python interpreter has not been initialized (i.e. the compiled program is expected to be purely in C++), then the library context objects are essentially no-op. If the python interpreter is initialized (e.g. inside a python program or command-line interface), then these library context objects will handle the GIL appropriately.

See the examples/cppexample.cxx C++ program for a simple implementation of the above behavior in a compiled program.

Closes #145.

This PR creates a new class that simplifies initialization of python threads, as well as acquiriing / releasing the Python global interpreter lock in various contexts. Use cases include: 1. Ensuring that Py_Initialize() is properly called at the beginning of a program that is expected to interact with the python interpreter, and also that Py_Finalize() is called when the program is finished. 2. Ensuring that the current thread state is saved and the GIL released as necessary, e.g. for IO operations, and then the thread state is restored on completion. 3. Ensuring that the GIL is acquired for one-off interaction with the python interpreter, and released when complete. A G3PythonContext object is used throughout the library code for cases 2 and 3. If the python interpreter has not been initialized (i.e. the compiled program is expected to be purely in C++), then these context objects are essentially no-op. If the python interpreter is initialized (e.g. inside a python program or command-line interface), then these context objects will handle the GIL appropriately. See the examples/cppexample.cxx C++ program for a simple implementation of the above behavior. This PR also adds logic throughout the G3PipelineInfo and G3ModuleConfig class definitions to enable them to serialize appropriately in a pure-C++ program.

nwhitehorn · 2024-02-28T12:33:25Z

How do you handle lock-order reversals during destructor calls (and GIL acquisition)? This is the rock we've always hit before: shared_ptr deleters can call into Python at unpredictable times and the only way to avoid deadlocks or crashes is to ensure that no shared pointers that Python has even potentially seen ever go out of scope with any non-GIL locks held and -- if any such pointers do go out of scope without the GIL held, Boost is patched to acquire the GIL in every shared pointer deleter when calling into Python. In practice, since it is hard to identify which pointers may have touched Python, this means the only places we have been able to use threading are those where all handled objects originate in C++ (i.e. the DfMux collector code).

cozzyd · 2024-02-28T14:02:29Z

Assuming Sasha adapted my attempt at creating a similar RAII class in the json_output branch without adding any additional magic, I think unfortunately it doesn't solve that problem.

I never got multithreaded code that can touch G3PipelineInfo (which contains arbitrary python things) to never eventually deadlock, presumably as you said, because the shared pointer destructor can happen in an unpredictable context and could not find a good solution.

arahlin · 2024-02-28T14:29:17Z

I basically don't serialize anything that isn't a frameobject in the G3PipelineInfo structure, and just bail onto the next thing. It's not a great solution, and probably doesn't solve the deadlock issue, but it does allow the test program to at least read a PipelineInfo frame without segfaulting.

cozzyd · 2024-02-28T15:14:54Z

I'll merge this branch into my json_output branch and see if the deadlock issues still happen or not :)

nwhitehorn · 2024-02-28T15:40:57Z

I think the best course here is to either initialize the Python interpreter by hand in the C++ example or to detect that one doesn't exist and bail.

arahlin · 2024-02-28T15:55:00Z

I think the best course here is to either initialize the Python interpreter by hand in the C++ example or to detect that one doesn't exist and bail.

Yeah, that's the direction I've taken with this implementation. There's no reason to initialize the interpreter in library code. I can add better words to that effect if that's not clear from the current description.

cozzyd · 2024-02-28T16:32:03Z

Is there any case you would call this with anything other than (...,true, false) or (...,false,true)?

If not, it may be a bit friendlier to separate initialization from the GIL context by introducing something like G3PythonInitializer instead for (...,false,true) case ?

arahlin · 2024-02-28T16:39:26Z

Is there any case you would call this with anything other than (...,true, false) or (...,false,true)?

If not, it may be a bit friendlier to separate initialization from the GIL context by introducing something like G3PythonInitializer instead for (...,false,true) case ?

I guess the way it's implemented, you can initialize the interpreter and either hold onto the GIL (true, true) or immediately release it (false, true). I wouldn't be opposed to creating two (or three?) separate objects for this, though, if that makes things clearer. Library code only ever uses the (false, false) or (true, false) states, and only if python has already been initialized.

cozzyd · 2024-02-28T21:09:03Z

So I merged this into my json_output branch and found I still get segfaults when reading files with G3PipelineInfo in them concurrently.

However, 0eea5ee seems to "fix" it (or at least make seg faults rare enough that I haven't come across them yet in hundreds of thousands of attempts with 8 threads). I can't say it's the only additional place we need to hold the GIL, but I think it makes sense that we need to hold the GIL here, if you want to cherry pick it...

arahlin · 2024-02-28T22:14:36Z

So I merged this into my json_output branch and found I still get segfaults when reading files with G3PipelineInfo in them concurrently.

However, 0eea5ee seems to "fix" it (or at least make seg faults rare enough that I haven't come across them yet in hundreds of thousands of attempts with 8 threads). I can't say it's the only additional place we need to hold the GIL, but I think it makes sense that we need to hold the GIL here, if you want to cherry pick it...

Have you tried running your PR without the interpeter initialized?

These are python objects, and if we allow them to be deleted otherwise, bad things happen. This fixes at least most of the concurrency problems I have with reading files that have G3PipelineInfo in them?

Use a G3MapFrameObject storage structure for the module arguments, rather than a map of python objects. Since the serialization process requires a call to repr() for non-G3FrameObjects anyway, do this step in the python shim that creates the config in the first place. Also ensure that simple scalar values are serialized as frame objects. Adds a new ``spt3g.core.to_g3frameobject`` function for converting python objects to G3FrameObjects.

…text altogether

cozzyd · 2024-02-29T07:09:35Z

The executable in my PR did eventually crash without enabling the interpreter, but I think it was related to the "global" thing you just removed (the crash was in the boost::python::object destructor). After your latest changes, it doesn't crash anymore in 500,000 trials with 32 threads... nicely done!

arahlin · 2024-02-29T07:45:40Z

Great! So I ended up just yanking all of the repr() machinery out of the C++ side of the module config code, and letting the python side handle creating and/or eval'ing the strings if necessary. @nwhitehorn how does this look to you?

nwhitehorn · 2024-03-01T02:02:19Z

This seems like a nice approach. Hats off! Would it be possible to split it into two pieces, one for the G3ModuleConfig stuff and one for the G3PythonContext tidying? They look cleanly separable if you do the G3PythonContext updates first.

arahlin · 2024-03-01T02:07:38Z

This seems like a nice approach. Hats off! Would it be possible to split it into two pieces, one for the G3ModuleConfig stuff and one for the G3PythonContext tidying? They look cleanly separable if you do the G3PythonContext updates first.

Sure, I'll move the ModuleConfig bits to a separate PR.

nwhitehorn

Looks good. Thanks!

arahlin requested a review from cnweaver February 28, 2024 06:12

arahlin self-assigned this Feb 28, 2024

arahlin requested review from nwhitehorn and cozzyd February 28, 2024 06:12

more words

8e949a4

arahlin and others added 10 commits February 28, 2024 18:07

split interpreter initialization and GIL handling into separate classes

2b09b89

Hold GIL while clearing config in G3ModuleInfo

17e90d5

These are python objects, and if we allow them to be deleted otherwise, bad things happen. This fixes at least most of the concurrency problems I have with reading files that have G3PipelineInfo in them?

remove spurious python context

2c4f205

don't bother with globals

f4654ff

pass tests. most pipelines are too complex to exec(repr()) anyway

f89aec8

mark non-G3FrameObject arguments as repr strings and avoid python con…

eefb20d

…text altogether

cleanup

8ae8b39

remove now-pointless to_g3frameobject function

d7b9535

cleanup

eaaff1b

arahlin added 2 commits February 29, 2024 01:40

monkeypatch argument getter/setter methods for G3ModuleConfig

dd682c8

rename

c1fc23c

arahlin added 4 commits February 29, 2024 02:02

variable name

3dd3638

ocd

865c9f9

limit eval namespace, values method

b6a1ed5

no print

48ba193

moving modconfig changes to a separate PR

ae8bd11

nwhitehorn approved these changes Mar 1, 2024

View reviewed changes

arahlin merged commit 31ac2b0 into master Mar 1, 2024
1 check passed

arahlin deleted the gil_context branch March 1, 2024 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

G3PythonContext class for handling the Python GIL #146

G3PythonContext class for handling the Python GIL #146

arahlin commented Feb 28, 2024 •

edited

Loading

nwhitehorn commented Feb 28, 2024

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024

cozzyd commented Feb 28, 2024

nwhitehorn commented Feb 28, 2024

arahlin commented Feb 28, 2024 •

edited

Loading

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024 •

edited

Loading

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024

cozzyd commented Feb 29, 2024

arahlin commented Feb 29, 2024

nwhitehorn commented Mar 1, 2024

arahlin commented Mar 1, 2024

nwhitehorn left a comment

G3PythonContext class for handling the Python GIL #146

G3PythonContext class for handling the Python GIL #146

Conversation

arahlin commented Feb 28, 2024 • edited Loading

nwhitehorn commented Feb 28, 2024

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024

cozzyd commented Feb 28, 2024

nwhitehorn commented Feb 28, 2024

arahlin commented Feb 28, 2024 • edited Loading

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024 • edited Loading

cozzyd commented Feb 28, 2024

arahlin commented Feb 28, 2024

cozzyd commented Feb 29, 2024

arahlin commented Feb 29, 2024

nwhitehorn commented Mar 1, 2024

arahlin commented Mar 1, 2024

nwhitehorn left a comment

Choose a reason for hiding this comment

arahlin commented Feb 28, 2024 •

edited

Loading

arahlin commented Feb 28, 2024 •

edited

Loading

arahlin commented Feb 28, 2024 •

edited

Loading