Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1491306 Phase 0 AST for DataFrame first, sample, and random_split #1813

Conversation

sfc-gh-vbudati
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1491306

  2. Please describe how your code solves the related issue.

This PR adds API coverage for DataFrame first, sample, and random_split. I added tests for the same.

@sfc-gh-vbudati sfc-gh-vbudati added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-server Server-side snowpark work labels Jun 20, 2024
@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team as a code owner June 20, 2024 22:52
@sfc-gh-vbudati sfc-gh-vbudati requested review from sfc-gh-sfan, sfc-gh-aalam and sfc-gh-jrose and removed request for a team, sfc-gh-sfan, sfc-gh-aalam and sfc-gh-jrose June 20, 2024 22:52
tests/ast/data/df_first.test Outdated Show resolved Hide resolved
tests/ast/data/df_random_split.test Outdated Show resolved Hide resolved
@@ -0,0 +1,17 @@
## TEST CASE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test will not pass right now since the Local History code has not implemented any "random" functionality yet. A NotImplementedError is raised.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true? One way or another, we should make sure that all (not disabled) checked in tests pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run into this in a number of recent APIs as well. I've been able to get around this by checking session._conn._suppress_not_implemented_error, which is a new connection property I've added. In many places, I just return None if suppression is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-azwiegincew I'm still unable to get this test to pass. Now, the NotImplementedError is not raised but instead a KeyError is raised:

def handle_function_expression(    
        exp: FunctionExpression,
        input_data: Union[TableEmulator, ColumnEmulator],
        analyzer: "MockAnalyzer",
        expr_to_alias: Dict[str, str],
        current_row=None,
    ):
            . . .
try:
>           result = _MOCK_FUNCTION_IMPLEMENTATION_MAP[func_name](*to_pass_args)
E           KeyError: 'random'

../../src/snowflake/snowpark/mock/_plan.py:454: KeyError

What should I do to get around this? Should I try to catch the error and check if it's a KeyError with this message?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In random_split(), right after creating the AST, say something along the lines of if self._conn._suppress_not_implemented_error: return None

tests/ast/data/df_sample.test Outdated Show resolved Hide resolved
@@ -0,0 +1,17 @@
## TEST CASE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true? One way or another, we should make sure that all (not disabled) checked in tests pass.

-df-first-sample-random-split

# Conflicts:
#	src/snowflake/snowpark/_internal/proto/ast_pb2.py
@sfc-gh-azwiegincew
Copy link
Collaborator

Ping.

@sfc-gh-vbudati sfc-gh-vbudati requested a review from a team as a code owner July 10, 2024 21:13
@sfc-gh-vbudati sfc-gh-vbudati removed the request for review from a team July 10, 2024 21:19
@sfc-gh-vbudati sfc-gh-vbudati marked this pull request as draft July 10, 2024 21:19
@sfc-gh-vbudati sfc-gh-vbudati marked this pull request as ready for review July 16, 2024 01:20
@@ -0,0 +1,17 @@
## TEST CASE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In random_split(), right after creating the AST, say something along the lines of if self._conn._suppress_not_implemented_error: return None

@sfc-gh-vbudati sfc-gh-vbudati merged commit 9708e0f into server-side-snowpark Jul 16, 2024
12 of 14 checks passed
@sfc-gh-vbudati sfc-gh-vbudati deleted the vbudati/SNOW-1491306-df-first-sample-random-split branch July 16, 2024 17:10
@github-actions github-actions bot locked and limited conversation to collaborators Jul 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-server Server-side snowpark work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants