SNOW-1661142 Fix index name behavior #2274

sfc-gh-vbudati · 2024-09-11T22:27:35Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1661142
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
Please describe how your code solves the related issue.
Fixed a bug where updating an index's name updates the parent's index name when it is not supposed to. This is done by verifying that the query_compiler recorded during the index's creation matches that of its parent object when the parent object must be updated.

>>> df = pd.DataFrame(
...     {
...         "A": [0, 1, 2, 3, 4, 4],
...         "B": ['a', 'b', 'c', 'd', 'e', 'f'],
...     },
...     index = pd.Index([1, 2, 3, 4, 5, 6], name = "test"),
... )
>>> index = df.index
>>> df
      A  B
test      
1     0  a
2     1  b
3     2  c
4     3  d
5     4  e
6     4  f
>>> index.name = "test2"
>>> 
>>> df
       A  B
test2      
1      0  a
2      1  b
3      2  c
4      3  d
5      4  e
6      4  f
>>> df.dropna(inplace=True)
>>> index.name = "test3"
>>> df
       A  B
test2        # <--- name should not update      
1      0  a
2      1  b
3      2  c
4      3  d
5      4  e
6      4  f

For the full discussion, see thread: https://docs.google.com/document/d/1vdllzNgeUHMiffFNpm9SD1HOYUk8lkMVp14HQDoqr7s/edit?disco=AAABVbKjFJ0

sfc-gh-yzou · 2024-09-12T17:15:51Z

src/snowflake/snowpark/modin/plugin/extensions/datetime_index.py

@@ -163,6 +163,7 @@ def __new__(
        index._query_compiler = query_compiler
        # `_parent` keeps track of any Series or DataFrame that this Index is a part of.
        index._parent = None


how about get a class like IndexParent, and it records the necessary parent information, the constructor can take the df and record the information property, and it can also provide an interface like no_change_on_parent (you can use other name if you have other ones in mind) to help decide wether a replacement should happen, this could help wrap all parent logic at one place and simplify the code handling in the future

also, i have a quick question, shouldn't datetime_index an extension of index class, and it can call super().init to init the common part, right? how come we are initialization the same thing at different place?

__init__ in Index.py is used only for docstring generation, the actual initialization is happening in __new__. I'm not sure why it was designed this way.
But I created a new class like you suggested to abstract away the common logic.

i think the new is implemented because the returned object could be different based on the type. i don't think we have to fix that in this pr, just curious why we didn't call super class for the common part? cc @sfc-gh-azhan do you know?

sfc-gh-yzou · 2024-09-12T17:18:12Z

src/snowflake/snowpark/modin/plugin/extensions/index.py

-        if self._parent is not None:
+        # Update the name of the parent's index only if the parent's current query compiler
+        # matches the recorded query compiler (_parent_qc).
+        if self._parent is not None and self._parent_qc is self._parent._query_compiler:


also now after updating, you should make sure the parent_qc is updated to the new one, you can also wrap that logic in the IndexParent class

sfc-gh-azhan · 2024-09-12T19:01:41Z

src/snowflake/snowpark/modin/plugin/extensions/base_overrides.py

@@ -2262,7 +2262,7 @@ def _get_index(self):
        return self._query_compiler.index

    idx = Index(query_compiler=self._query_compiler)
-    idx._set_parent(self)
+    idx._parent.set_parent(self)


can we update _set_parent instead?

actually do we need that one line wrapper? the current place seems the only place calls it

I reverted it to _set_parent() since that seems like a "safer" way to set the parent details without exposing too much

sfc-gh-azhan · 2024-09-12T19:04:09Z

src/snowflake/snowpark/modin/plugin/extensions/index.py

@@ -71,6 +71,41 @@
 }


+class IndexParent:
+    def __init__(self) -> None:


Can we always init with values not None? if index._parent is None, then no parent, if it is not, it should mean parent exists.

I think the init can always take a df as input, index._parent can remain optional, which makes the relationship more clear

done! The init should now take parent and be the only way to set the parent details.

….com:snowflakedb/snowpark-python into vbudati/SNOW-1661142-fix-index-name-behavior

sfc-gh-yzou

@sfc-gh-vbudati Left some comment about docs and comments, please take care of it before merging!

sfc-gh-yzou · 2024-09-16T21:43:08Z

CHANGELOG.md

@@ -12,6 +12,10 @@

 - Added support for `TimedeltaIndex.mean` method.

+#### Bug Fixes
+
+- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name when it is not supposed to.


"index name when it is not supposed to." -> index name after an inplace updates have been applied to the original series/dataframe

sfc-gh-yzou · 2024-09-16T21:45:40Z

src/snowflake/snowpark/modin/plugin/extensions/index.py

+    def check_and_update_parent_qc_index_names(self, names: list) -> None:
+        """
+        Update the Index and its parent's index names if the parent's current query compiler matches
+        the recorded query compiler (`_parent_qc`).


add a comment here that "if the query compiler associated with parent is not the same as the original recorded one that means an inplace updates have been applied to the parent"

sfc-gh-yzou · 2024-09-16T22:10:11Z

src/snowflake/snowpark/modin/plugin/extensions/index.py

@@ -726,10 +760,10 @@ def name(self, value: Hashable) -> None:
        if not is_hashable(value):
            raise TypeError(f"{type(self).__name__}.name must be a hashable type")
        self._query_compiler = self._query_compiler.set_index_names([value])
+        # Update the name of the parent's index only if the parent's current query compiler


can you update the doc for the name function to make it clear that the inplace replacement only works when no inplace update has been applied to parent

… of parents

fix index name behavior

125f86d

sfc-gh-vbudati requested a review from a team as a code owner September 11, 2024 22:27

sfc-gh-vbudati requested review from sfc-gh-lspiegelberg and sfc-gh-nkumar September 11, 2024 22:27

github-actions bot added the snowpark-pandas label Sep 11, 2024

add changelog entry

299f33a

sfc-gh-vbudati added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Sep 11, 2024

Merge branch 'main' into vbudati/SNOW-1661142-fix-index-name-behavior

41e69e2

sfc-gh-yzou requested review from sfc-gh-azhan and sfc-gh-yzou and removed request for sfc-gh-lspiegelberg and sfc-gh-nkumar September 12, 2024 17:11

sfc-gh-yzou reviewed Sep 12, 2024

View reviewed changes

sfc-gh-vbudati added 2 commits September 12, 2024 11:38

create new class for parent

6e00c73

fix test

21dbccb

sfc-gh-vbudati requested a review from sfc-gh-yzou September 12, 2024 18:41

sfc-gh-vbudati added 2 commits September 12, 2024 11:41

Merge branch 'main' into vbudati/SNOW-1661142-fix-index-name-behavior

34f0e50

Update CHANGELOG.md

ab5656f

sfc-gh-azhan requested changes Sep 12, 2024

View reviewed changes

sfc-gh-vbudati added 2 commits September 12, 2024 18:58

initialize parent details when creating parent index

86c7310

Merge branch 'vbudati/SNOW-1661142-fix-index-name-behavior' of github…

bb6242b

….com:snowflakedb/snowpark-python into vbudati/SNOW-1661142-fix-index-name-behavior

sfc-gh-vbudati requested a review from sfc-gh-azhan September 13, 2024 01:59

sfc-gh-vbudati added 2 commits September 12, 2024 19:00

add none checking

9a1bce4

fix test

807b769

sfc-gh-azhan approved these changes Sep 13, 2024

View reviewed changes

sfc-gh-yzou approved these changes Sep 16, 2024

View reviewed changes

sfc-gh-vbudati added 2 commits September 16, 2024 15:53

update comments and changelog to specifically mention inplace updates…

a70f20c

… of parents

Merge branch 'main' into vbudati/SNOW-1661142-fix-index-name-behavior

2f6dcf3

sfc-gh-vbudati merged commit e93cd68 into main Sep 17, 2024
35 checks passed

sfc-gh-vbudati deleted the vbudati/SNOW-1661142-fix-index-name-behavior branch September 17, 2024 01:09

github-actions bot locked and limited conversation to collaborators Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1661142 Fix index name behavior #2274

SNOW-1661142 Fix index name behavior #2274

sfc-gh-vbudati commented Sep 11, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-vbudati Sep 12, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-vbudati Sep 12, 2024

sfc-gh-azhan Sep 12, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-vbudati Sep 13, 2024

sfc-gh-azhan Sep 12, 2024

sfc-gh-yzou Sep 12, 2024

sfc-gh-vbudati Sep 13, 2024

sfc-gh-yzou left a comment

sfc-gh-yzou Sep 16, 2024

sfc-gh-vbudati Sep 16, 2024

sfc-gh-yzou Sep 16, 2024

sfc-gh-vbudati Sep 16, 2024

sfc-gh-yzou Sep 16, 2024

sfc-gh-vbudati Sep 16, 2024

SNOW-1661142 Fix index name behavior #2274

SNOW-1661142 Fix index name behavior #2274

Conversation

sfc-gh-vbudati commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-yzou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment