Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1434962: Make groupby.apply sort result deterministic. #1625

Conversation

sfc-gh-mvashishtha
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1434962

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    sort_values by default is not stable since it uses kind="quicksort". Use a stable nlargest call with the default of keep="first" so that when multiple rows have the same price, we get them in the order that they appear in the input dataframe.

    Unfortunately I couldn't reproduce the flake locally by running the test repeatedly unless I manually changed the order of the input dataframe before the sort_values to simulate instability of sort_values. However, I think now we should be able to count on the stability of nlargest.

@sfc-gh-mvashishtha sfc-gh-mvashishtha added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGELOG-UPDATES NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels May 17, 2024
@sfc-gh-mvashishtha sfc-gh-mvashishtha marked this pull request as ready for review May 17, 2024 21:31
@sfc-gh-mvashishtha sfc-gh-mvashishtha requested a review from a team as a code owner May 17, 2024 21:31
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
@sfc-gh-azhan
Copy link
Collaborator

@sfc-gh-mvashishtha
Copy link
Contributor Author

try it now: ci-dev-142.int.snowflakecomputing.com/job/SnowparkPythonSnowPandasDailyRegressRunner/432

looks like it passed that run, at least

Copy link
Contributor

@sfc-gh-nkrishna sfc-gh-nkrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating!

@sfc-gh-mvashishtha sfc-gh-mvashishtha merged commit 7b454bc into main May 20, 2024
33 checks passed
@sfc-gh-mvashishtha sfc-gh-mvashishtha deleted the mvashishtha/SNOW-1434962/make-groupby-apply-test-order-deterministic branch May 20, 2024 18:08
@github-actions github-actions bot locked and limited conversation to collaborators May 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants