Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1642293 Add support for lazy index labels in reindex and fix reindex name bug #2175

Merged
merged 18 commits into from
Sep 4, 2024

Conversation

sfc-gh-vbudati
Copy link
Contributor

@sfc-gh-vbudati sfc-gh-vbudati commented Aug 28, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1642293

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

  4. Added support for using lazy index labels with reindex.

>>> ser = pd.Series([1, 2, 3], index=["A", "B", "C"])
>>> idx = pd.Index(["X", "Y", "Z"])
>>> ser.reindex(idx)
X   NaN
Y   NaN
Z   NaN
dtype: float64
  1. reindex has a bug - if it is performed with an index idx which has a name, it does not update the result series'/df's index name accordingly. The name remains None.

For instance,

>>> ser = pd.Series([0, 1, 2], index=list("ABC"), name="test")
>>> idx = native_pd.Index(list("CAB"), name="weewoo")
>>> snow_series.reindex(index=idx)
C    2
A    0
B    1
Name: test, dtype: int64

# Instead of:
weewoo
C      2
A      0
B      1
Name: test, dtype: int64
  1. Fixed a bug where Index objects name was not set correctly during binary operations.

@sfc-gh-vbudati sfc-gh-vbudati added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Aug 28, 2024
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next, let's make reindex can accept lazy index as labels

@sfc-gh-vbudati sfc-gh-vbudati changed the title SNOW-1642293 Fix reindex name bug SNOW-1642293 Add support for lazy index labels in reindex and fix reindex name bug Aug 28, 2024
Copy link
Contributor

@sfc-gh-nkrishna sfc-gh-nkrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- would it be possible to set the name field in the Index class as well when initialized in this PR (since you're already fixing the name bug)? To work around this: https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/extensions/index.py#L269-L271

Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a pass on the combined version.

tests/integ/modin/frame/test_rename.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
query_count = 2
join_count = 2
with SqlCounter(query_count=query_count, join_count=join_count):
with SqlCounter(query_count=2 if limit is None else 3, join_count=2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a todo for fixing limit once @sfc-gh-rdurrani 's is monotonic PR is done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query count is still the same, so no changes required.

@sfc-gh-vbudati sfc-gh-vbudati merged commit 71a2182 into main Sep 4, 2024
36 of 37 checks passed
@sfc-gh-vbudati sfc-gh-vbudati deleted the vbudati/SNOW-1642293-fix-reindex-bug branch September 4, 2024 19:07
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants