-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1654730: Refactor aggregation_utils to reduce duplication and clarify interfaces. #2270
Merged
sfc-gh-mvashishtha
merged 10 commits into
main
from
mvashishtha/SNOW-1654730/refactor-aggregation-utils
Sep 12, 2024
Merged
SNOW-1654730: Refactor aggregation_utils to reduce duplication and clarify interfaces. #2270
sfc-gh-mvashishtha
merged 10 commits into
main
from
mvashishtha/SNOW-1654730/refactor-aggregation-utils
Sep 12, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d1e9cb0
to
5486008
Compare
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
5486008
to
2fc1f15
Compare
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
sfc-gh-nkrishna
approved these changes
Sep 12, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor @sfc-gh-mvashishtha , just left one suggestion for an additional test
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
sfc-gh-azhan
approved these changes
Sep 12, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion left and looks good for me!
…pandas aggregation. Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
NO-CHANGELOG-UPDATES
This pull request does not need to update CHANGELOG.md
NO-PANDAS-CHANGEDOC-UPDATES
This PR does not update Snowpark pandas docs
snowpark-pandas
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes SNOW-1654730
Changes to the interface of aggregation_utils
get_snowflake_agg_func
now returns aNamedTuple
,SnowflakeAggFunc
, which contains a snowpark aggregation method, along with the boolpreserves_snowpark_pandas_types
. Formerly,get_snowflake_agg_func
would return a Snowpark aggregation and the caller would then use separate dictionaries to deduce how the aggregation affects Snowpark pandas types._array_agg_keepna
and_PANDAS_AGGREGATION_TO_SNOWPARK_PANDAS_AGGREGATION
generate_rowwise_aggregation_function
. Instead, makeget_snowflake_agg_func
the common interface to getSnowflakeAggFunc
even for axis=1, and makegenerate_rowwise_aggregation_function
an internal method called_generate_rowwise_aggregation_function
.Changes to the internals of aggregation_utils
SNOWFLAKE_BUILTIN_AGG_FUNC_MAP
mapped from Snowpark pandas aggregations to the axis=0 aggregation;GROUPBY_AGG_PRESERVES_SNOWPARK_PANDAS_TYPE
andGROUPBY_AGG_WITH_NONE_SNOWPARK_PANDAS_TYPES
told whether the axis=0 aggregations would preserve Snowpark pandas types;SNOWFLAKE_COLUMNS_AGG_FUNC_MAP
would tell how to aggregate on axis=1 when skipna=True; andSNOWFLAKE_COLUMNS_KEEPNA_AGG_FUNC_MAP
would tell how to aggregate on axis=1 when skipna=False. All of these maps repeated the mapping of pairs like"sum"
andnp.sum
to the same aggregation function. In this commit, keep a single mapping,_PANDAS_AGGREGATION_TO_SNOWPARK_PANDAS_AGGREGATION
, from pandas aggregations to instances of the internal tuple_SnowparkPandasAggregation
._SnowparkPandasAggregation
includespreserves_snowpark_pandas_type
, as well as optionally the aggregation functions foraxis=0
;axis=1, skipna=False
; andaxis=1, skipna=True
.New feature
groupby().var()
no longer raisesNotImplementedError
, but it's invalid in pandas, so we correctly raiseTypeError
.