Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-869536 Fix buggy behavior in DataFrame.to_local_iterator #1226

Merged
merged 8 commits into from
Feb 1, 2024

Conversation

sfc-gh-stan
Copy link
Collaborator

@sfc-gh-stan sfc-gh-stan commented Jan 30, 2024

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-869536: Iterators from to_local_iterator stop returning results after another query occurs #945

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Issue: All queries issued from a Snowpark Session object are executed using the same SnowflakeCursor instance, this causes unexpected, wrong behavior when we fetch results from an iterator but have code that executes a different query half way, e.g.

     df = session.create_dataframe([[1,2,3],[4,5,6]], schema=["a", "b", "c"]) 
     my_iter = df.to_local_iterator()
     row_counter
     for row in my_iter:
        len(df.schema.fields)  # this executes a schema query, which overwrites properties of the cursor object
        row_counter += 1
     print(counter)  # this prints 1 since schema query only returns 1 row, but should be 2
    

    Fix: This PR changes src/snowflake/snowpark/_internal/server_connection.py create a new cursor object locally to read data from the last executed query and continues to use this cursor object in the iterator to achieve isolation between the iterator and the following queries executed within the session.

@sfc-gh-stan sfc-gh-stan changed the title SNOW-869536 Fix buggy behavior in result_set_to_iter by using a new cursor SNOW-869536 Fix buggy behavior in DataFrame.to_local_iterator and DataFrame.to_pandas_batches by using a new cursor Jan 30, 2024
@sfc-gh-stan sfc-gh-stan marked this pull request as ready for review January 31, 2024 00:40
@sfc-gh-stan sfc-gh-stan requested a review from a team as a code owner January 31, 2024 00:40
@sfc-gh-stan sfc-gh-stan changed the title SNOW-869536 Fix buggy behavior in DataFrame.to_local_iterator and DataFrame.to_pandas_batches by using a new cursor SNOW-869536 Fix buggy behavior in DataFrame.to_local_iterator Jan 31, 2024
) -> Dict[str, Any]:
if to_iter and not to_pandas: # Fix for SNOW-869536
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_pandas doesn't have this issue, SnowflakeCursor.fetch_pandas_batches already handles the isolation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into comment instead? This seems more informative than "fix for" :)

@@ -434,8 +434,14 @@ def _to_data_or_iter(
results_cursor: SnowflakeCursor,
to_pandas: bool = False,
to_iter: bool = False,
num_statements: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we remove this because the param is unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

) -> Dict[str, Any]:
if to_iter and not to_pandas: # Fix for SNOW-869536
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into comment instead? This seems more informative than "fix for" :)

@sfc-gh-stan sfc-gh-stan enabled auto-merge (squash) February 1, 2024 19:56
@sfc-gh-stan sfc-gh-stan merged commit 234b026 into main Feb 1, 2024
57 checks passed
@sfc-gh-stan sfc-gh-stan deleted the fix-SNOW-869536 branch February 1, 2024 22:24
@github-actions github-actions bot locked and limited conversation to collaborators Feb 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNOW-869536: Iterators from to_local_iterator stop returning results after another query occurs
3 participants