Skip to content

Commit

Permalink
SNOW-1063346: Remove modin/pandas/dataframe.py (#2223)
Browse files Browse the repository at this point in the history
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1063346

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

This PR removes dataframe.py (the Snowpark pandas one, not the Snowpark
Python one), following #2167 and #2205. Once more, preserved overrides
are given a reason in code comments. The following implemented methods
have been added to `dataframe_overrides.py`:
- __init__
- __dataframe__
- __and__, __rand__, __or__, __ror__
- apply, applymap
- columns
- corr
- dropna
- fillna
- groupby
- info
- insert
- isin
- join
- mask
- melt
- merge
- replace
- rename
- pivot_table
- pow
- rpow
- select_dtypes
- set_axis
- set_index
- shape
- squeeze
- sum
- stack
- transpose
- unstack
- value_counts
- where
- iterrows
- itertuples
- __repr__
- _repr_html_
- _to_datetime
- _to_pandas
- __setitem__
  • Loading branch information
sfc-gh-joshi authored Sep 14, 2024
1 parent 7a5e6bb commit 64ced96
Show file tree
Hide file tree
Showing 29 changed files with 2,339 additions and 3,778 deletions.
14 changes: 5 additions & 9 deletions src/snowflake/snowpark/modin/pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,13 @@

# TODO: SNOW-851745 make sure add all Snowpark pandas API general functions
from modin.pandas import plotting # type: ignore[import]
from modin.pandas.dataframe import DataFrame
from modin.pandas.series import Series

from snowflake.snowpark.modin.pandas.api.extensions import (
register_dataframe_accessor,
register_series_accessor,
)
from snowflake.snowpark.modin.pandas.dataframe import DataFrame
from snowflake.snowpark.modin.pandas.general import (
bdate_range,
concat,
Expand Down Expand Up @@ -185,10 +185,8 @@
modin.pandas.base._ATTRS_NO_LOOKUP.update(_ATTRS_NO_LOOKUP)


# For any method defined on Series/DF, add telemetry to it if it:
# 1. Is defined directly on an upstream class
# 2. The method name does not start with an _, or is in TELEMETRY_PRIVATE_METHODS

# For any method defined on Series/DF, add telemetry to it if the method name does not start with an
# _, or the method is in TELEMETRY_PRIVATE_METHODS. This includes methods defined as an extension/override.
for attr_name in dir(Series):
# Since Series is defined in upstream Modin, all of its members were either defined upstream
# or overridden by extension.
Expand All @@ -197,11 +195,9 @@
try_add_telemetry_to_attribute(attr_name, getattr(Series, attr_name))
)


# TODO: SNOW-1063346
# Since we still use the vendored version of DataFrame and the overrides for the top-level
# namespace haven't been performed yet, we need to set properties on the vendored version
for attr_name in dir(DataFrame):
# Since DataFrame is defined in upstream Modin, all of its members were either defined upstream
# or overridden by extension.
if not attr_name.startswith("_") or attr_name in TELEMETRY_PRIVATE_METHODS:
register_dataframe_accessor(attr_name)(
try_add_telemetry_to_attribute(attr_name, getattr(DataFrame, attr_name))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@
# existing code originally distributed by the Modin project, under the Apache License,
# Version 2.0.

from modin.pandas.api.extensions import register_series_accessor
from modin.pandas.api.extensions import (
register_dataframe_accessor,
register_series_accessor,
)

from .extensions import register_dataframe_accessor, register_pd_accessor
from .extensions import register_pd_accessor

__all__ = [
"register_dataframe_accessor",
Expand Down
43 changes: 0 additions & 43 deletions src/snowflake/snowpark/modin/pandas/api/extensions/extensions.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,49 +86,6 @@ def decorator(new_attr: Any):
return decorator


def register_dataframe_accessor(name: str):
"""
Registers a dataframe attribute with the name provided.
This is a decorator that assigns a new attribute to DataFrame. It can be used
with the following syntax:
```
@register_dataframe_accessor("new_method")
def my_new_dataframe_method(*args, **kwargs):
# logic goes here
return
```
The new attribute can then be accessed with the name provided:
```
df.new_method(*my_args, **my_kwargs)
```
If you want a property accessor, you must annotate with @property
after the call to this function:
```
@register_dataframe_accessor("new_prop")
@property
def my_new_dataframe_property(*args, **kwargs):
return _prop
```
Parameters
----------
name : str
The name of the attribute to assign to DataFrame.
Returns
-------
decorator
Returns the decorator function.
"""
import snowflake.snowpark.modin.pandas as pd

return _set_attribute_on_obj(
name,
pd.dataframe._DATAFRAME_EXTENSIONS_,
pd.dataframe.DataFrame,
)


def register_pd_accessor(name: str):
"""
Registers a pd namespace attribute with the name provided.
Expand Down
Loading

0 comments on commit 64ced96

Please sign in to comment.