Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1063347: Remove series.py #2205

Merged
merged 38 commits into from
Sep 5, 2024
Merged

SNOW-1063347: Remove series.py #2205

merged 38 commits into from
Sep 5, 2024

Conversation

sfc-gh-joshi
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi commented Aug 30, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1063347

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

This PR removes series.py (following up #2167). As before, any preserved overrides are given a reason in code comments. The following methods have been added to series_overrides.py:

  • init
  • sub/subtract, and most other binary operators
  • aggregate/agg
  • apply
  • map
  • argmax/argmin
  • case_when
  • dropna
  • duplicated
  • sum/std/var
  • cat/str/dt
  • name
  • empty
  • fillna
  • groupby
  • info
  • _qcut
  • replace
  • reset_index
  • set_axis
  • rename
  • sort_values
  • unstack
  • where
  • value_counts
  • shift
  • squeeze
  • _prepare_inter_op
  • _to_datetime
  • to_list
  • _create_or_update_from_compiler
  • to_frame
  • _to_pandas
  • to_numpy
  • setitem
  • repr

Because of differences in __getattr__ logic, we need to add dt, str, and columns to _ATTRS_NO_LOOKUP in this PR to prevent additional queries from being incurred. This will create slightly different behavior for attempting to access a named index element directly:

>>> s = pd.Series(["a", "b"], index=["str", "columns"])
>>> s.columns
AttributeError: 'Series' object has no attribute 'columns'
>>> s.to_pandas().columns  # DIFFERENT: treats as index access
'b'
>>> s.str
<modin.pandas.series_utils.StringMethods object at 0x135f57940>
>>> s.to_pandas().str  # SAME: gets accessor
<pandas.core.strings.accessor.StringMethods object at 0x135f93b50>

@sfc-gh-joshi sfc-gh-joshi added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs DO-NOT-MERGE labels Aug 30, 2024
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi/rip-series-2 branch 2 times, most recently from 293f95a to e18d646 Compare September 3, 2024 17:27
@sfc-gh-joshi sfc-gh-joshi changed the title [DRAFT] Remove series.py SNOW-1063347: Remove series.py Sep 3, 2024
@sfc-gh-joshi sfc-gh-joshi marked this pull request as ready for review September 3, 2024 22:48
@sfc-gh-joshi sfc-gh-joshi requested a review from a team as a code owner September 3, 2024 22:48
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!

tests/integ/modin/series/test_take.py Show resolved Hide resolved
tests/integ/modin/series/test_getattr.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/modin/plugin/__init__.py Outdated Show resolved Hide resolved
@@ -1491,7 +1491,7 @@ def iloc():
With a scalar integer.

>>> type(df.iloc[0])
<class 'snowflake.snowpark.modin.pandas.series.Series'>
<class 'modin.pandas.series.Series'>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i felt little bit wired to have test shown the whole class path in the example, is that the only place we have this? we can fix it like this for now, but maybe later we can have a separate pr to remove those in all examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other tests and docstrings (describe, info) do this; I'll do a more thorough pass in a later PR.

@@ -12352,6 +12382,18 @@ def _quantiles_single_col(

return SnowflakeQueryCompiler(internal_frame)

def mode(
self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can those method not implemented error be at the overwritten part? instead of query compiler like how other apis works today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it to the frontend overrides, but if it is possible to push the error to the query compiler without changing existing behavior I think we should do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API not implemented errors today are all thrown at frontend api layer, so let's keep it the same everywhere, if we decide to do those not implemented error at query_compiler layer, let's do it the same for all API also


For `Series` axis parameter is unused and defaults to 0.

>>> ser.take([0, 3], axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is removed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently accepting axis=1 didn't match pandas behavior. If we want to preserve the behavior of ignoring axis=1, we'd have to add back an override stub for it, but I don't think it makes sense do keep.

>>> native_pd.Series([1]).take([0], axis=1)
Traceback (most recent call last):
  File "/Users/joshi/miniconda3/envs/snowpandas-39/lib/python3.9/site-packages/pandas/core/generic.py", line 575, in _get_axis_number
    return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joshi/miniconda3/envs/snowpandas-39/lib/python3.9/site-packages/pandas/core/generic.py", line 4132, in take
    axis=self._get_block_manager_axis(axis),
  File "/Users/joshi/miniconda3/envs/snowpandas-39/lib/python3.9/site-packages/pandas/core/generic.py", line 595, in _get_block_manager_axis
    axis = cls._get_axis_number(axis)
  File "/Users/joshi/miniconda3/envs/snowpandas-39/lib/python3.9/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
    raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, do you know if it is a new behavior introduced in 2.2.* compare with 1.5.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems like it works in pandas 2.0.3 and fails in pandas 2.1.4.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a entry in the change log to indicate this difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/snowflake/snowpark/modin/pandas/indexing.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sfc-gh-yzou sfc-gh-yzou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-joshi Thanks for taking care of the comment! Can you add one description in the changelog to indicate the change for take?


For `Series` axis parameter is unused and defaults to 0.

>>> ser.take([0, 3], axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a entry in the change log to indicate this difference?

@sfc-gh-joshi sfc-gh-joshi enabled auto-merge (squash) September 5, 2024 18:46
@sfc-gh-joshi sfc-gh-joshi merged commit 20be5fb into main Sep 5, 2024
32 of 35 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi/rip-series-2 branch September 5, 2024 21:55
@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants