Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematically replace __del__ with weakref.finalize() #246

Merged
merged 9 commits into from
Dec 2, 2024

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Nov 15, 2024

Closes #141

See #141 (comment) for the rationale.

This _MembersNeededForFinalize pattern is applied systematically in cuda_core:

The weakref_finalize_toy_example.py demonstrates conclusively that the finalizer runs immediately when the main object (ShopKeeper in the example) goes out of scope; i.e. the _MembersNeededForFinalize pattern does not create reference cycles:

Start experiment
    Start short term 0 False
        Close 0 -1.0
    End short term
    Start short term 1 False
Exception ignored in: <finalize object at 0x72ecf4f1bf00; dead>
Traceback (most recent call last):
  File "/usr/lib/python3.12/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rgrossekunst/clone/stuff/random_attic/weakref_finalize_toy_example.py", line 20, in close
    print("        Close", self.serno, 1 / (self.serno - 1))
                                       ~~^~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
    End short term
    Start short term 2 False
        Close 2 1.0
    End short term
    Start short term 0 True
        Close 0 -1.0
    End short term
    Start short term 1 True
        PROBLEM 1
    End short term
    Start short term 2 True
        Close 2 1.0
    End short term
End experiment
All done.

For simple cases (e.g. _event.py) the _MembersNeededForFinalize pattern might seem more complex than necessary, but note that it is generally safer. See commit 26ddbf6 for a side-by-side comparison. Note that self._finalizer.Detach() is needed in the slightly simpler alternative. The need for this is likely to be overlooked when the code is extended or refactored in the future and could lead to situations akin to a double-free that may only be discovered in production.

@rwgk rwgk requested a review from leofang November 15, 2024 19:45
@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Nov 16, 2024
@leofang leofang added this to the cuda.core beta 2 milestone Nov 16, 2024
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ralf! A general comment: We can't use self.close as the finalizer because it'd still hold a reference to self (see @wence-'s original #87 (comment)). The most simple thing we could do is to move it out of the class, so instead of

class Something:

    def __init__(self, ...):
        self._handle = ...
        weakref.finalize(self, self.close)

    def close(self, ...):
        if self._handle:
            # do something

we do

# note the 1st arg name is still "self", to mimic a method
def _close_something(self, ...):
    if self._handle:
        # do something

class Something:

    def __init__(self, ...):
        self._handle = ...
        weakref.finalize(self, _close_something)

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 16, 2024

Ah, confirmed, by looking more. Yesterday I was jumping to a wrong conclusion, twisty explanation omitted. I'll try again.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 16, 2024

I think I found a nice pattern to avoid reference cycles. This is my updated toy example:

https://github.com/rwgk/stuff/blob/7d93794069898f245e14fb94d779816291fde1ff/random_attic/weakref_finalize_toy_example.py

The trick is to introduce a level of indirection, self._members, and tie weakref.finalize() to those.

I still need to adopt that pattern in this PR.

@leofang leofang marked this pull request as draft November 27, 2024 02:43
@leofang leofang requested a review from shwina November 28, 2024 03:27
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 30, 2024

@leofang @shwina this PR is ready for review now. I updated the PR description to explain the _MembersNeededForFinalize pattern.

@rwgk rwgk marked this pull request as ready for review November 30, 2024 20:14
@shwina
Copy link
Contributor

shwina commented Dec 2, 2024

@leofang @shwina this PR is ready for review now. I updated the PR description to explain the _MembersNeededForFinalize pattern.

Thanks, Ralf! I might be missing some context here, but I'm not sure how to interpret the output of the weakref_finalize_toy_example.py script. Could you please explain more about what we are studying in this experiment, and more specifically, how does it demonstrate the need for the _MembersNeededForFinalize pattern?

@shwina
Copy link
Contributor

shwina commented Dec 2, 2024

Spoke with Ralf offline and now I have more clarity. Some notes from our call:

  1. The goal of weakref_finalize_toy_example.py is to show that (1) all Shopkeeper objects are destructed immediately while exiting the context of the function short_term_shop_keeper, (2) exceptions thrown by the finalizer behave appropriately (a stack trace is printed, but the exception itself is ignored).
  2. It would be nice if we could abstract away some or all of the logic behind a class decorator (or a metaclass):
    @finalize(...)
    class Stream:
      ...
    This would have a few different advantages:
  • We would eliminate some or all boilerplate

  • We would be able to document the design/rationale for finalize in a central location

    But it's not immediately clear to me how we would write such a class decorator or metaclass, or if that is even possible.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a clever approach around the limitation of weakref.finalize not being able to accept a bound method as a finalizer.

Approving with one caveat/question. Why don't we just handle the destruction of stream, event, etc., in the Cython, i.e.,:

cdef class CUStream:
    def __cinit__(self, ...):
        # presumably we have a __cinit__

    def __dealloc__(self):
        # why don't we just call cuStreamDestroy in a __dealloc__?

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2024

Approving with one caveat/question. Why don't we just handle the destruction of stream, event, etc., in the Cython, i.e.,:

That didn't cross my mind, sounds interesting, but I don't know enough Cython to know if that'll work. @leofang I'd be happy to play with that, as one way for me to get deeper into Cython, if you think that makes sense.

@leofang
Copy link
Member

leofang commented Dec 2, 2024

Why don't we just handle the destruction of stream, event, etc., in the Cython,

If the subject is the low-level CUStream (from cuda.bindings):

  • since it's a legacy API we need to keep it C like (explicit destruction using cuStreamDestroy), so no change of semantics here unfortunately

If the subject is the pythonic Stream (from cuda.core):

  • yes we did discuss the possibility of lowering to Cython and relying on __dealloc__ for deterministic behavior (it was listed as an option in Need to make class destruction more robust #141), but @rwgk made a good point that we should try to stay in pure Python for now for rapid development; we should revisit this discussion after a few releases

@shwina
Copy link
Contributor

shwina commented Dec 2, 2024

yes we did discuss the possibility of lowering to Cython and relying on dealloc for deterministic behavior (it was listed as an option in #141), but @rwgk made a good point that we should try to stay in pure Python for now for rapid development; we should revisit this discussion after a few releases

We could have both:

  • At the lowest level would be the current CUStream type.
  • We could introduce an intermediate CUStreamRAII type which simply creates a CUStream in its __cinit__ and calls cuStreamDestroy in its __dealloc__.
  • The Python Stream type would just wrap the RAII type and otherwise remain as is.

@leofang
Copy link
Member

leofang commented Dec 2, 2024

Right, it'd be a refactoring to what's done in this PR. I am fine with pursing this at a later time (this PR looks fine to me and gets the job done), but if we lower the entire cuda.core to Cython is it still relevant? Perhaps we can track this discussion in an issue?

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: @rwgk IIRC you mentioned finalizer.detach() is needed if close() is explicitly called (to avoid double free). I assume it's no longer true because the finalizer becomes a no-op due to the pre-conditions such as if self.handle is not None?

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2024

Q: @rwgk IIRC you mentioned finalizer.detach() is needed if close() is explicitly called (to avoid double free). I assume it's no longer true because the finalizer becomes a no-op due to the pre-conditions such as if self.handle is not None?

Yes, the way I think about it: By way of the intermediate object (_mnff), the handling of the state (already closed or not) is centralized, and is thereby intuitive, doesn't need special attention.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2024

/ok to test

@rwgk rwgk merged commit c3077da into NVIDIA:main Dec 2, 2024
1 check passed
@rwgk rwgk deleted the cuda_core_weakref_finalize branch December 2, 2024 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to make class destruction more robust
3 participants