Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-948486 Support specifying input column names for vectorized UDTF #1124

Merged
merged 5 commits into from
Nov 3, 2023

Conversation

sfc-gh-stan
Copy link
Collaborator

@sfc-gh-stan sfc-gh-stan commented Nov 1, 2023

SNOW-948486

The PR introduces a new optional parameter input_names to UDTFRegistration.register/register_file and functions.py/pandas_udtf, where we now support specifying input column names for UDTF's. Note this is essentially a NOOP for regular UDTFs since the columns are mapped to function parameters, the feature is only useful for vectorized UDTF where we rely on names to access the columns in the pandas DataFrame. The parameter is set as optional for backwards compatibility, if unspecified, the default column names will be ARG1, ARG2, etc.

As a result, we now infer the column names from schema in RelationalGroupedDataFrame.applyInPandas.

Copy link
Collaborator

@sfc-gh-sfan sfc-gh-sfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how this API change can work together with the behavior change? i.e. What happens if we release before the BCR server change, and what happens if we release after the BCR server change.

@sfc-gh-stan
Copy link
Collaborator Author

Could you elaborate on how this API change can work together with the behavior change? i.e. What happens if we release before the BCR server change, and what happens if we release after the BCR server change.

The server change has already been released and enabled in all deployments by default. The change basically allows users to specify the input column names from function definition command (using the signature). Previously the function signature will be ignored and users had to set columns like pdf.columns = ["col1", "col2"] in the function body.
Note that specifying column names is essentially a NOOP for regular UDTF's, since we just use function parameters to access the input columns, so client never supported specifying column names for UDTF input. That's why this change is required to accommodate the BCR.
Please see the changed doc tests for how BCR impacts vectorized UDTF registration.

Copy link
Collaborator

@sfc-gh-mkeller sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🚢

@sfc-gh-stan sfc-gh-stan enabled auto-merge (squash) November 2, 2023 23:57
@sfc-gh-mkeller sfc-gh-mkeller merged commit df85e76 into main Nov 3, 2023
50 of 51 checks passed
@sfc-gh-mkeller sfc-gh-mkeller deleted the accomodate-vectorized-UDTF-BCR branch November 3, 2023 00:19
@github-actions github-actions bot locked and limited conversation to collaborators Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants