-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-948486 Support specifying input column names for vectorized UDTF #1124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on how this API change can work together with the behavior change? i.e. What happens if we release before the BCR server change, and what happens if we release after the BCR server change.
The server change has already been released and enabled in all deployments by default. The change basically allows users to specify the input column names from function definition command (using the signature). Previously the function signature will be ignored and users had to set columns like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 🚢
SNOW-948486
The PR introduces a new optional parameter
input_names
toUDTFRegistration.register/register_file
andfunctions.py/pandas_udtf
, where we now support specifying input column names for UDTF's. Note this is essentially a NOOP for regular UDTFs since the columns are mapped to function parameters, the feature is only useful for vectorized UDTF where we rely on names to access the columns in the pandas DataFrame. The parameter is set as optional for backwards compatibility, if unspecified, the default column names will beARG1
,ARG2
, etc.As a result, we now infer the column names from schema in
RelationalGroupedDataFrame.applyInPandas
.