Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1803811: Allow mixed-case field names for struct type columns #2640

Merged
merged 9 commits into from
Dec 16, 2024

Conversation

sfc-gh-jrose
Copy link
Contributor

@sfc-gh-jrose sfc-gh-jrose commented Nov 15, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1803811

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    This PR addresses an issue with StructType columns where names for StructFields are always uppercased. Snowflake always uppercases column names, but fields that are internal to StructType columns do not have casing enforced. Ideally we would not treat StructFields as column objects, but separating out that responsibility would require a BCR.

This change gets around the BCR requirement by using a feature flag. When enabled it shouldn't be much of a change as it assumes that only the top level of StructFields in a struct object are columns. Any structfields that are nested deeper in the schema are instead treated a OBJECT field names. This hierarchy should only exist in structured columns.

@sfc-gh-jrose sfc-gh-jrose marked this pull request as ready for review November 16, 2024 00:32
@sfc-gh-jrose sfc-gh-jrose requested review from a team as code owners November 16, 2024 00:32
CHANGELOG.md Outdated Show resolved Hide resolved
@sfc-gh-jrose sfc-gh-jrose force-pushed the jrose_snow_1803811_maintain_struct_field_casing branch from 125f1df to ed051f5 Compare December 5, 2024 23:01
@sfc-gh-jrose sfc-gh-jrose force-pushed the jrose_snow_1803811_maintain_struct_field_casing branch from ed051f5 to f05c058 Compare December 5, 2024 23:02
@sfc-gh-jrose
Copy link
Contributor Author

@sfc-gh-aling @sfc-gh-aalam
I've completely refactored the approach to better avoid bcr concerns. I've changed around the logic to more closely match the server-side. Now top level StructFields will be assumed to describe columns as they previously have. Anything nested inside of the schema will instead be considered an actual StructField and have the correct naming logic applied. This approach also gets around the annoying quotation marks that would be added to inner fields.

@sfc-gh-lspiegelberg
I've added you as a reviewer because this change breaks a few AST test and I'm hoping you can explain the process for updating those with the new expectations.

Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. left 2 minor comments

CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines 576 to 577
return self.column_identifier.name if self.is_column else self._name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does is_column=True enforce type of self.column_identifier is ColumnIdentifier? can we add an assert for that in __init__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ uses the setter for self.name which ensures that self.column_identifier is always ColumnIdentifier.

Comment on lines +24 to +25
# Global flag that determines if structured type semantics should be used
_should_use_structured_type_semantics = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who's responsible for setting this parameter, also do we need server side parameter for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be enabled until structured type support is rolled out. There will eventually be a server side parameter that we can track instead of this flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, if this is not rolled out yet then I'm good with this module var

@sfc-gh-jrose sfc-gh-jrose merged commit 0362c46 into main Dec 16, 2024
39 checks passed
@sfc-gh-jrose sfc-gh-jrose deleted the jrose_snow_1803811_maintain_struct_field_casing branch December 16, 2024 23:26
@github-actions github-actions bot locked and limited conversation to collaborators Dec 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants