Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cudf._lib.scalar in favor of pylibcudf #17701

Merged
merged 20 commits into from
Jan 24, 2025

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jan 9, 2025

Description

This PR changes cudf.Scalar.device_scalar to be a pylibcudf.Scalar object instead of a cudf._lib.scalar.DeviceScalar.

Most of the conversion logic previously in cudf._lib.scalar.DeviceScalar now lives in python/cudf/cudf/core/scalar.py

Some tests that exercised behaviors of cudf.Scalar.device_scalar when it was a cudf._lib.scalar.DeviceScalar were modified/removed.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 9, 2025
@mroeschke mroeschke self-assigned this Jan 9, 2025
@mroeschke mroeschke requested review from a team as code owners January 9, 2025 02:14
@github-actions github-actions bot added the CMake CMake build issue label Jan 9, 2025
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall this looks good. I had some questions, then I'll approve.

python/cudf/cudf/core/scalar.py Show resolved Hide resolved
python/cudf/cudf/core/scalar.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_scalar.py Show resolved Hide resolved
python/cudf/cudf/tests/test_struct.py Show resolved Hide resolved
python/cudf/cudf/core/scalar.py Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh sorry I accidentally left this review from last week finished but unsubmitted. LGTM though, with some small suggestions for improvement and notes for future work.

value, dtype
)

@classmethod
def from_device_scalar(cls, device_scalar):
if not isinstance(device_scalar, cudf._lib.scalar.DeviceScalar):
def from_device_scalar(cls, device_scalar: plc.Scalar) -> Self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do it in this PR if you don't want to (especially since there is additional refactoring happening and discussions around whether we keep cudf.Scalar at all) but maybe add a "TODO:" that this function should be named from_pylibcudf if we do keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can fold this into #17760

from cudf._typing import Dtype, ScalarLike


def _preprocess_host_value(value, dtype) -> tuple[ScalarLike, Dtype]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function can definitely be simplified, or at least optimized to reduce the complexity of the conditional cascades, but given that this is not a new function we can do that later and keep this PR focused on the DeviceScalar removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. I think this can be replace with _to_plc_scalar introduced in this PR but I'll tackle that in a follow up

Parameters
----------
value: Scalarlike
dtype: dtypelike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should really audit functions like this that are mostly internal to see if we can guarantee that we only get actual dtypes rather than something that ducktypes as a dtype. The pd.api.types functions are shockingly expensive, especially if they get used in deeper parts of the code like this that get called many times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I think this follows #12494 too

Returns
-------
plc.Scalar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical to fix if you see this (moved) function going away soon, but typically we care more about what the returned value is than its type in the docstring. This isn't a public docstring though, so very minor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I can fold this into #17760


pa_scalar = pa.scalar(value, type=pa_type)
plc_scalar = plc.interop.from_arrow(pa_scalar)
if isinstance(dtype, (Decimal32Dtype, Decimal64Dtype)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep an eye #17422 since that will necessitate corresponding changes on the Python side. I don't think anything will break if we don't change anything we'll just be able to avoid these types of conversions afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Yes it would be great to avoid this

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9f3cb65 into rapidsai:branch-25.02 Jan 24, 2025
106 checks passed
@mroeschke mroeschke deleted the rm/scalar/devicescalar branch January 24, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants