Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Change __cuda_stream__ from class attribute to method #348

Closed
leofang opened this issue Jan 3, 2025 · 3 comments · Fixed by #389
Closed

RFC: Change __cuda_stream__ from class attribute to method #348

leofang opened this issue Jan 3, 2025 · 3 comments · Fixed by #389
Assignees
Labels
breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module P0 High priority - Must do! RFC Plans and announcements

Comments

@leofang
Copy link
Member

leofang commented Jan 3, 2025

One of the reasons that people complain about CUDA Array Interface (CAI) being slow is because it has side effects. Consider this example:

>>> class KKK:
...     @property
...     def __proppp__(self):
...             print("I am called")
... 
>>> 
>>> type(KKK.__proppp__)
<class 'property'>
>>> k = KKK()
>>> hasattr(k, "__proppp__")
I am called
True

One can see that the hasattr call on any class attribute/property would actually execute the attribute implementation, but not if it is a method:

>>> class KKK:
...     def __proppp__(self):
...             print("I am called")
... 
>>> 
>>> type(KKK.__proppp__)
<class 'function'>
>>> k = KKK()
>>> hasattr(k, "__proppp__")
True

I realized I had forgotten about this during the design stage until now. This fact has also been pointed out by @gmarkall (back in the days of designing CAI, IIRC).

I propose to change __cuda_stream__ and any future protocols to a callable class method from a class property/attribute so as to avoid side effects (thereby reducing overhead) and establish a future-proof convention, following DLPack and other protocols.

@leofang leofang added breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module RFC Plans and announcements triage Needs the team's attention labels Jan 3, 2025
@leofang leofang self-assigned this Jan 3, 2025
@bdice
Copy link

bdice commented Jan 4, 2025

To add some context: this is the intended design of hasattr. https://docs.python.org/3/library/functions.html#hasattr

The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.)

@property decorators create descriptors, which are one of many ways to add dynamic attributes to an object. Unfortunately, the only way to resolve such dynamic attribute lookups is by actually trying to get the attribute, so we are left paying this cost when using properties.

Relevant StackOverflow post: https://stackoverflow.com/questions/30143957/why-does-hasattr-execute-the-property-decorator-code-block

I ran some microbenchmarks and I agree that using a method instead of a property is a good solution to increase performance, regardless of whether the user takes an EAFP (try: ... except AttributeError:) or LBYL (if hasattr(...):) approach.

@leofang leofang added P0 High priority - Must do! and removed triage Needs the team's attention labels Jan 9, 2025
@leofang leofang added this to the cuda.core beta 3 milestone Jan 9, 2025
@leofang leofang assigned NaderAlAwar and unassigned leofang Jan 11, 2025
@leofang
Copy link
Member Author

leofang commented Jan 11, 2025

@NaderAlAwar, this should be a quick fix, could you help make the change and propagate it to cuda.par PR please? 🙂

@shwina
Copy link
Contributor

shwina commented Jan 11, 2025

@bdice Just out of curiosity, I added a couple of more tests to your benchmark (cached properties, getattr_static):

Click to see benchmark
import time
import timeit

setups = {
    "cached_property": """
from functools import cached_property
class MyClass:
    @cached_property
    def __protocol__(self):
        time.sleep(0.00001)
        return True
obj = MyClass()""",
    "property": """
class MyClass:
    @property
    def __protocol__(self):
        time.sleep(0.00001)
        return True

obj = MyClass()""",
    "method": """
class MyClass:
    def __protocol__(self):
        time.sleep(0.00001)
        return True

obj = MyClass()""",
    "none": """
class MyClass:
    pass

obj = MyClass()""",
}

tests = {
    "eafp": """
try:
    obj.__protocol__
except AttributeError:
    pass
    """,
    "lbyl": """hasattr(obj, "__protocol__")""",
    "gentle-lbyl": (
        """"__protocol__" in obj.__dir__() or hasattr(obj, "__protocol__")"""
    ),
    "getattr_static": """
import inspect
inspect.getattr_static(obj, "__protocol__", None)
""",
}

results = {}
for setup_name, setup_stmt in setups.items():
    for test_name, test_stmt in tests.items():
        test_name = f"{setup_name}, {test_name}"
        times = timeit.repeat(
            setup=setup_stmt,
            stmt=test_stmt,
            repeat=10,
            number=1000,
            globals={"time": time},
        )
        avg_time = sum(times) / len(times)
        results[test_name] = avg_time

for test_name, result in sorted(results.items(), key=lambda x: x[1]):
    print(f"{test_name:>18}: {result:03g}")

Method access is still fastest (although cached properties is a very close second).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module P0 High priority - Must do! RFC Plans and announcements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants