Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G3PythonContext class for handling the Python GIL #146

Merged
merged 19 commits into from
Mar 1, 2024
Merged

Conversation

arahlin
Copy link
Member

@arahlin arahlin commented Feb 28, 2024

This PR creates a new class that simplifies initialization of python threads, as well as acquiriing / releasing the Python global interpreter lock in various contexts.

Use cases include:

  1. Ensuring that Py_Initialize() is properly called at the beginning of a program that is expected to interact with the python interpreter, and also that Py_Finalize() is called when the program is finished.
  2. Ensuring that the current thread state is saved and the GIL released as necessary, e.g. for IO operations, and then the thread state is restored on completion.
  3. Ensuring that the GIL is acquired for one-off interaction with the python interpreter, and released when complete.

A G3PythonContext object is used throughout the library code for cases 2 and 3. Case 1 is handled by a G3PythonInterpreter object constructed at the main program/thread level. If the python interpreter has not been initialized (i.e. the compiled program is expected to be purely in C++), then the library context objects are essentially no-op. If the python interpreter is initialized (e.g. inside a python program or command-line interface), then these library context objects will handle the GIL appropriately.

See the examples/cppexample.cxx C++ program for a simple implementation of the above behavior in a compiled program.

Closes #145.

This PR creates a new class that simplifies initialization of python threads, as
well as acquiriing / releasing the Python global interpreter lock in various
contexts.

Use cases include:

1. Ensuring that Py_Initialize() is properly called at the beginning of a
   program that is expected to interact with the python interpreter, and also
   that Py_Finalize() is called when the program is finished.
2. Ensuring that the current thread state is saved and the GIL released as
   necessary, e.g. for IO operations, and then the thread state is restored on
   completion.
3. Ensuring that the GIL is acquired for one-off interaction with the python
   interpreter, and released when complete.

A G3PythonContext object is used throughout the library code for cases 2 and 3.
If the python interpreter has not been initialized (i.e. the compiled program is
expected to be purely in C++), then these context objects are essentially no-op.
If the python interpreter is initialized (e.g. inside a python program or
command-line interface), then these context objects will handle the GIL
appropriately.

See the examples/cppexample.cxx C++ program for a simple implementation of the
above behavior.

This PR also adds logic throughout the G3PipelineInfo and G3ModuleConfig
class definitions to enable them to serialize appropriately in a pure-C++
program.
@arahlin arahlin requested a review from cnweaver February 28, 2024 06:12
@arahlin arahlin self-assigned this Feb 28, 2024
@arahlin arahlin requested review from nwhitehorn and cozzyd February 28, 2024 06:12
@nwhitehorn
Copy link
Member

How do you handle lock-order reversals during destructor calls (and GIL acquisition)? This is the rock we've always hit before: shared_ptr deleters can call into Python at unpredictable times and the only way to avoid deadlocks or crashes is to ensure that no shared pointers that Python has even potentially seen ever go out of scope with any non-GIL locks held and -- if any such pointers do go out of scope without the GIL held, Boost is patched to acquire the GIL in every shared pointer deleter when calling into Python. In practice, since it is hard to identify which pointers may have touched Python, this means the only places we have been able to use threading are those where all handled objects originate in C++ (i.e. the DfMux collector code).

@cozzyd
Copy link
Contributor

cozzyd commented Feb 28, 2024

Assuming Sasha adapted my attempt at creating a similar RAII class in the json_output branch without adding any additional magic, I think unfortunately it doesn't solve that problem.

I never got multithreaded code that can touch G3PipelineInfo (which contains arbitrary python things) to never eventually deadlock, presumably as you said, because the shared pointer destructor can happen in an unpredictable context and could not find a good solution.

@arahlin
Copy link
Member Author

arahlin commented Feb 28, 2024

I basically don't serialize anything that isn't a frameobject in the G3PipelineInfo structure, and just bail onto the next thing. It's not a great solution, and probably doesn't solve the deadlock issue, but it does allow the test program to at least read a PipelineInfo frame without segfaulting.

@cozzyd
Copy link
Contributor

cozzyd commented Feb 28, 2024

I'll merge this branch into my json_output branch and see if the deadlock issues still happen or not :)

@nwhitehorn
Copy link
Member

I think the best course here is to either initialize the Python interpreter by hand in the C++ example or to detect that one doesn't exist and bail.

@arahlin
Copy link
Member Author

arahlin commented Feb 28, 2024

I think the best course here is to either initialize the Python interpreter by hand in the C++ example or to detect that one doesn't exist and bail.

Yeah, that's the direction I've taken with this implementation. There's no reason to initialize the interpreter in library code. I can add better words to that effect if that's not clear from the current description.

@cozzyd
Copy link
Contributor

cozzyd commented Feb 28, 2024

Is there any case you would call this with anything other than (...,true, false) or (...,false,true)?

If not, it may be a bit friendlier to separate initialization from the GIL context by introducing something like G3PythonInitializer instead for (...,false,true) case ?

@arahlin
Copy link
Member Author

arahlin commented Feb 28, 2024

Is there any case you would call this with anything other than (...,true, false) or (...,false,true)?

If not, it may be a bit friendlier to separate initialization from the GIL context by introducing something like G3PythonInitializer instead for (...,false,true) case ?

I guess the way it's implemented, you can initialize the interpreter and either hold onto the GIL (true, true) or immediately release it (false, true). I wouldn't be opposed to creating two (or three?) separate objects for this, though, if that makes things clearer. Library code only ever uses the (false, false) or (true, false) states, and only if python has already been initialized.

@cozzyd
Copy link
Contributor

cozzyd commented Feb 28, 2024

So I merged this into my json_output branch and found I still get segfaults when reading files with G3PipelineInfo in them concurrently.

However, 0eea5ee seems to "fix" it (or at least make seg faults rare enough that I haven't come across them yet in hundreds of thousands of attempts with 8 threads). I can't say it's the only additional place we need to hold the GIL, but I think it makes sense that we need to hold the GIL here, if you want to cherry pick it...

@arahlin
Copy link
Member Author

arahlin commented Feb 28, 2024

So I merged this into my json_output branch and found I still get segfaults when reading files with G3PipelineInfo in them concurrently.

However, 0eea5ee seems to "fix" it (or at least make seg faults rare enough that I haven't come across them yet in hundreds of thousands of attempts with 8 threads). I can't say it's the only additional place we need to hold the GIL, but I think it makes sense that we need to hold the GIL here, if you want to cherry pick it...

Have you tried running your PR without the interpeter initialized?

arahlin and others added 10 commits February 28, 2024 18:07
These are python objects, and if we allow them to be deleted otherwise,
bad things happen. This fixes at least most of the concurrency problems
I have with reading files that have G3PipelineInfo in them?
Use a G3MapFrameObject storage structure for the module arguments, rather than a
map of python objects.  Since the serialization process requires a call to
repr() for non-G3FrameObjects anyway, do this step in the python shim that
creates the config in the first place.

Also ensure that simple scalar values are serialized as frame objects.

Adds a new ``spt3g.core.to_g3frameobject`` function for converting
python objects to G3FrameObjects.
@cozzyd
Copy link
Contributor

cozzyd commented Feb 29, 2024

The executable in my PR did eventually crash without enabling the interpreter, but I think it was related to the "global" thing you just removed (the crash was in the boost::python::object destructor). After your latest changes, it doesn't crash anymore in 500,000 trials with 32 threads... nicely done!

@arahlin
Copy link
Member Author

arahlin commented Feb 29, 2024

Great! So I ended up just yanking all of the repr() machinery out of the C++ side of the module config code, and letting the python side handle creating and/or eval'ing the strings if necessary. @nwhitehorn how does this look to you?

@nwhitehorn
Copy link
Member

This seems like a nice approach. Hats off! Would it be possible to split it into two pieces, one for the G3ModuleConfig stuff and one for the G3PythonContext tidying? They look cleanly separable if you do the G3PythonContext updates first.

@arahlin
Copy link
Member Author

arahlin commented Mar 1, 2024

This seems like a nice approach. Hats off! Would it be possible to split it into two pieces, one for the G3ModuleConfig stuff and one for the G3PythonContext tidying? They look cleanly separable if you do the G3PythonContext updates first.

Sure, I'll move the ModuleConfig bits to a separate PR.

Copy link
Member

@nwhitehorn nwhitehorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@arahlin arahlin merged commit 31ac2b0 into master Mar 1, 2024
1 check passed
@arahlin arahlin deleted the gil_context branch March 1, 2024 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate python GIL / threads context handling
3 participants