-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1661142 Fix index name behavior #2274
SNOW-1661142 Fix index name behavior #2274
Conversation
@@ -163,6 +163,7 @@ def __new__( | |||
index._query_compiler = query_compiler | |||
# `_parent` keeps track of any Series or DataFrame that this Index is a part of. | |||
index._parent = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about get a class like IndexParent, and it records the necessary parent information, the constructor can take the df and record the information property, and it can also provide an interface like no_change_on_parent (you can use other name if you have other ones in mind) to help decide wether a replacement should happen, this could help wrap all parent logic at one place and simplify the code handling in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, i have a quick question, shouldn't datetime_index an extension of index class, and it can call super().init to init the common part, right? how come we are initialization the same thing at different place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__
in Index.py is used only for docstring generation, the actual initialization is happening in __new__
. I'm not sure why it was designed this way.
But I created a new class like you suggested to abstract away the common logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the new is implemented because the returned object could be different based on the type. i don't think we have to fix that in this pr, just curious why we didn't call super class for the common part? cc @sfc-gh-azhan do you know?
if self._parent is not None: | ||
# Update the name of the parent's index only if the parent's current query compiler | ||
# matches the recorded query compiler (_parent_qc). | ||
if self._parent is not None and self._parent_qc is self._parent._query_compiler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also now after updating, you should make sure the parent_qc is updated to the new one, you can also wrap that logic in the IndexParent class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -2262,7 +2262,7 @@ def _get_index(self): | |||
return self._query_compiler.index | |||
|
|||
idx = Index(query_compiler=self._query_compiler) | |||
idx._set_parent(self) | |||
idx._parent.set_parent(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update _set_parent
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually do we need that one line wrapper? the current place seems the only place calls it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it to _set_parent()
since that seems like a "safer" way to set the parent details without exposing too much
@@ -71,6 +71,41 @@ | |||
} | |||
|
|||
|
|||
class IndexParent: | |||
def __init__(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we always init with values not None? if index._parent is None, then no parent, if it is not, it should mean parent exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the init can always take a df as input, index._parent can remain optional, which makes the relationship more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! The init should now take parent
and be the only way to set the parent details.
….com:snowflakedb/snowpark-python into vbudati/SNOW-1661142-fix-index-name-behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-vbudati Left some comment about docs and comments, please take care of it before merging!
CHANGELOG.md
Outdated
@@ -12,6 +12,10 @@ | |||
|
|||
- Added support for `TimedeltaIndex.mean` method. | |||
|
|||
#### Bug Fixes | |||
|
|||
- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name when it is not supposed to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"index name when it is not supposed to." -> index name after an inplace updates have been applied to the original series/dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
def check_and_update_parent_qc_index_names(self, names: list) -> None: | ||
""" | ||
Update the Index and its parent's index names if the parent's current query compiler matches | ||
the recorded query compiler (`_parent_qc`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here that "if the query compiler associated with parent is not the same as the original recorded one that means an inplace updates have been applied to the parent"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -726,10 +760,10 @@ def name(self, value: Hashable) -> None: | |||
if not is_hashable(value): | |||
raise TypeError(f"{type(self).__name__}.name must be a hashable type") | |||
self._query_compiler = self._query_compiler.set_index_names([value]) | |||
# Update the name of the parent's index only if the parent's current query compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the doc for the name function to make it clear that the inplace replacement only works when no inplace update has been applied to parent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1661142
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Fixed a bug where updating an index's name updates the parent's index name when it is not supposed to. This is done by verifying that the query_compiler recorded during the index's creation matches that of its parent object when the parent object must be updated.
For the full discussion, see thread: https://docs.google.com/document/d/1vdllzNgeUHMiffFNpm9SD1HOYUk8lkMVp14HQDoqr7s/edit?disco=AAABVbKjFJ0