Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-1429199]: Provide clearer error message for groupby aggregations #2308

Merged
merged 9 commits into from
Sep 19, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

- Improved `to_pandas` to persist the original timezone offset for TIMESTAMP_TZ type.
- Improved `dtype` results for TIMESTAMP_TZ type to show correct timezone offset.
- Improved error message when passing non-bool value to `numeric_only` for groupby aggregations.

#### New Features

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1173,9 +1173,20 @@ def _wrap_aggregation(
Returns the same type as `self._df`.
"""
# TODO: SNOW-1063349: Modin upgrade - modin.pandas.groupby.DataFrameGroupBy functions
numeric_only = validate_bool_kwarg(
numeric_only, "numeric_only", none_allowed=True
)
try:
numeric_only = validate_bool_kwarg(
numeric_only, "numeric_only", none_allowed=True
)
except ValueError:
# SNOW-1429199: Snowpark users expect to be able to pass in the column to aggregate
# on in the aggregation method, e.g. df.groupby("COL0").sum("COL1"), but the pandas
# API's only accept the numeric_only argument, so users get an error complaining that
# the numeric_only kwarg expects a bool argument, but a string was passed in. Instead
# of that error, we can throw this error instead that will make it more clear to users
# what went wrong.
raise ValueError(
f"GroupBy aggregations like 'sum' take a 'numeric_only' argument that needs to be a bool, but a {type(numeric_only).__name__} value was passed in."
)

agg_args = tuple() if agg_args is None else agg_args
agg_kwargs = dict() if agg_kwargs is None else agg_kwargs
Expand Down
5 changes: 4 additions & 1 deletion tests/integ/modin/groupby/test_groupby_negative.py
Original file line number Diff line number Diff line change
Expand Up @@ -536,7 +536,10 @@ def test_groupby_agg_invalid_numeric_only(
# treated as True. This behavior is confusing to customer, in Snowpark pandas, we do an
# explicit type check, an errors it out if an invalid value is given.
with pytest.raises(
ValueError, match=re.escape('For argument "numeric_only" expected type bool')
ValueError,
match=re.escape(
f"GroupBy aggregations like 'sum' take a 'numeric_only' argument that needs to be a bool, but a {type(numeric_only).__name__} value was passed in."
),
):
getattr(basic_snowpark_pandas_df.groupby("col1"), agg_method_name)(
numeric_only=numeric_only
Expand Down
Loading