-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cudf._lib.scalar in favor of pylibcudf #17701
Remove cudf._lib.scalar in favor of pylibcudf #17701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, overall this looks good. I had some questions, then I'll approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh sorry I accidentally left this review from last week finished but unsubmitted. LGTM though, with some small suggestions for improvement and notes for future work.
python/cudf/cudf/core/scalar.py
Outdated
value, dtype | ||
) | ||
|
||
@classmethod | ||
def from_device_scalar(cls, device_scalar): | ||
if not isinstance(device_scalar, cudf._lib.scalar.DeviceScalar): | ||
def from_device_scalar(cls, device_scalar: plc.Scalar) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to do it in this PR if you don't want to (especially since there is additional refactoring happening and discussions around whether we keep cudf.Scalar at all) but maybe add a "TODO:" that this function should be named from_pylibcudf
if we do keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can fold this into #17760
from cudf._typing import Dtype, ScalarLike | ||
|
||
|
||
def _preprocess_host_value(value, dtype) -> tuple[ScalarLike, Dtype]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function can definitely be simplified, or at least optimized to reduce the complexity of the conditional cascades, but given that this is not a new function we can do that later and keep this PR focused on the DeviceScalar removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I think this can be replace with _to_plc_scalar
introduced in this PR but I'll tackle that in a follow up
Parameters | ||
---------- | ||
value: Scalarlike | ||
dtype: dtypelike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we should really audit functions like this that are mostly internal to see if we can guarantee that we only get actual dtypes rather than something that ducktypes as a dtype. The pd.api.types
functions are shockingly expensive, especially if they get used in deeper parts of the code like this that get called many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I think this follows #12494 too
Returns | ||
------- | ||
plc.Scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical to fix if you see this (moved) function going away soon, but typically we care more about what the returned value is than its type in the docstring. This isn't a public docstring though, so very minor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I can fold this into #17760
|
||
pa_scalar = pa.scalar(value, type=pa_type) | ||
plc_scalar = plc.interop.from_arrow(pa_scalar) | ||
if isinstance(dtype, (Decimal32Dtype, Decimal64Dtype)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep an eye #17422 since that will necessitate corresponding changes on the Python side. I don't think anything will break if we don't change anything we'll just be able to avoid these types of conversions afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Yes it would be great to avoid this
/merge |
Description
This PR changes
cudf.Scalar.device_scalar
to be apylibcudf.Scalar
object instead of acudf._lib.scalar.DeviceScalar
.Most of the conversion logic previously in
cudf._lib.scalar.DeviceScalar
now lives inpython/cudf/cudf/core/scalar.py
Some tests that exercised behaviors of
cudf.Scalar.device_scalar
when it was acudf._lib.scalar.DeviceScalar
were modified/removed.Checklist