Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-966485 Fix reader schema with metadata columns #1143

Merged
merged 11 commits into from
Dec 1, 2023

Conversation

sfc-gh-sfan
Copy link
Collaborator

Description

Dataframe Reader does not change the schema query when metadata column is involved, so further actions like save_as_table could fail because the created table schema does not match the query.

This fix adds the metadata columns to the beginning of the schema, to be consistent with the query.

Testing

integ test

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-966485

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Dataframe Reader does not change the schema query when metadata column is involved, so further actions like save_as_table could fail because the created table schema does not match the query.

This fix adds the metadata columns to the beginning of the schema, to be consistent with the query.

Description

Dataframe Reader does not change the schema query when metadata
column is involved, so further actions like save_as_table could
fail because the created table schema does not match the query.

This fix adds the metadata columns to the beginning of the schema,
to be consistent with the query.

Testing

integ test
@sfc-gh-sfan sfc-gh-sfan requested a review from a team as a code owner November 22, 2023 20:37
Comment on lines 922 to 923
for metadata_col in metadata_columns
] + schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for metadata_col in metadata_columns
] + schema
for metadata_col in metadata_columns or []
] + schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is could be why some tests are failing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I also realized the same after observing the test failure lol

@sfc-gh-sfan
Copy link
Collaborator Author

sfc-gh-sfan commented Nov 22, 2023

I don't understand the type checking failure. It does not fail on my laptop:

sfan@C02FQ10YMD6R snowpark-python % pyright src/snowflake/snowpark/_internal/analyzer 
0 errors, 0 warnings, 0 informations 
sfan@C02FQ10YMD6R snowpark-python % pyright --version
pyright 1.1.337

Specifically, I moved some code block from a file that is not covered by pyright to a file that is covered. But the error message is very confusing.

  /home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:891:69 - error: Argument of type "defaultdict[Unknown, dict[Unknown, Unknown]]" cannot be assigned to parameter "expr_to_alias" of type "Dict[str, str] | None" in function "analyze"
    Type "defaultdict[Unknown, dict[Unknown, Unknown]]" cannot be assigned to type "Dict[str, str] | None"
      "defaultdict[Unknown, dict[Unknown, Unknown]]" is incompatible with "Dict[str, str]"
        Type parameter "_VT@dict" is invariant, but "dict[Unknown, Unknown]" is not the same as "str"
        Consider switching from "dict" to "Mapping" which is covariant in the value type
      "defaultdict[Unknown, dict[Unknown, Unknown]]" is incompatible with "None" (reportGeneralTypeIssues)

self.session._analyzer.analyze(col._expression, defaultdict(dict))

I don't understand how expr_to_alias can be impacted.

  /home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:890:47 - error: Expression of type "list[str | List[str]]" cannot be assigned to declared type "List[str]" (reportGeneralTypeIssues)

                metadata_project: List[str] = [
                    self.session._analyzer.analyze(col._expression, defaultdict(dict))
                    for col in metadata_columns
                ]

If self.session._analyzer.analyze returns str, then why would it thought the rhs is a list[str | List[str]]?

@sfc-gh-aalam
Copy link
Contributor

Details

try running tox -e pyright

@sfc-gh-sfan
Copy link
Collaborator Author

sfan@C02FQ10YMD6R snowpark-python % tox -e pyright
.pkg: _optional_hooks> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_editable> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_sdist> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_wheel> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_sdist> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
pyright: install_package> bash ./scripts/tox_install_cmd.sh --force-reinstall --no-deps /Users/sfan/snowpark-python/.tox/.tmp/package/2/snowflake-snowpark-python-1.10.0.tar.gz
pyright: commands[0]> pyright src/snowflake/snowpark/_internal/analyzer
0 errors, 0 warnings, 0 informations 
.pkg: _exit> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  pyright: OK (5.71=setup[2.71]+cmd[3.00] seconds)
  congratulations :) (5.85 seconds)

🤔

Copy link
Collaborator

@sfc-gh-mkeller sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🚢

@sfc-gh-aalam sfc-gh-aalam enabled auto-merge (squash) December 1, 2023 01:28
@sfc-gh-aalam sfc-gh-aalam merged commit 765c086 into main Dec 1, 2023
55 checks passed
@sfc-gh-aalam sfc-gh-aalam deleted the shixuan_metadata branch December 1, 2023 01:49
@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants