Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] SnowflakeDatasource update #10005

Merged
merged 28 commits into from
Jun 10, 2024
Merged

Conversation

Kilo59
Copy link
Contributor

@Kilo59 Kilo59 commented Jun 5, 2024

SnowflakeDatasource changes

  1. Make database + schema fields required.
  2. Use datasource level schema when creating table assets. The Snowflake datasource can no longer span multiple schemas + databases.
  3. Only allow config substitution in the user:password section of the snowflake connection string.

Why

a. Being more opinionated about what fields are required helps us to raise better error messages to the user and avoid opaque connection errors.
b. This unlocks future work that requires a SnowflakeDatasource to always map to a single schema.

Related 0.18.x changes

All of the changes in this pull request were originally applied to the 0.18.x branch.

Upcoming changes

  1. Making role + warehouse required.
  2. Making a specific SnowflakeTableAsset that has read-only schema + database attributes.

Copy link

netlify bot commented Jun 5, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit ad6cfee
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/66638ec05e9dd70008084171

@Kilo59 Kilo59 force-pushed the f/lak-975/sf-require-fields branch from cf5dee6 to 7ff0989 Compare June 5, 2024 21:01
Copy link

codecov bot commented Jun 5, 2024

Codecov Report

Attention: Patch coverage is 96.55172% with 6 lines in your changes missing coverage. Please review.

Project coverage is 79.41%. Comparing base (1554c4a) to head (ad6cfee).

Files Patch % Lines
...ctations/datasource/fluent/snowflake_datasource.py 95.04% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10005      +/-   ##
===========================================
+ Coverage    79.35%   79.41%   +0.06%     
===========================================
  Files          456      456              
  Lines        38989    39151     +162     
===========================================
+ Hits         30938    31093     +155     
- Misses        8051     8058       +7     
Flag Coverage Δ
3.10 65.79% <81.03%> (+0.05%) ⬆️
3.10 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 databricks ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 snowflake ?
3.10 spark ?
3.10 trino ?
3.11 65.77% <81.03%> (+0.03%) ⬆️
3.11 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 54.41% <59.77%> (+<0.01%) ⬆️
3.11 aws_deps 45.62% <46.55%> (-0.01%) ⬇️
3.11 big 54.45% <46.55%> (-0.05%) ⬇️
3.11 databricks 46.76% <46.55%> (-0.01%) ⬇️
3.11 filesystem 60.28% <59.77%> (-0.02%) ⬇️
3.11 mssql 49.78% <46.55%> (-0.03%) ⬇️
3.11 mysql 49.84% <46.55%> (-0.03%) ⬇️
3.11 postgresql 54.03% <46.55%> (-0.04%) ⬇️
3.11 snowflake 47.59% <87.35%> (+0.14%) ⬆️
3.11 spark 57.28% <46.55%> (-0.06%) ⬇️
3.11 trino 51.91% <46.55%> (-0.04%) ⬇️
3.8 65.80% <81.03%> (+0.03%) ⬆️
3.8 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 54.42% <59.77%> (+<0.01%) ⬆️
3.8 aws_deps 45.64% <46.55%> (-0.01%) ⬇️
3.8 big 54.46% <46.55%> (-0.05%) ⬇️
3.8 databricks 46.78% <46.55%> (-0.01%) ⬇️
3.8 filesystem 60.29% <59.77%> (-0.02%) ⬇️
3.8 mssql 49.76% <46.55%> (-0.03%) ⬇️
3.8 mysql 49.83% <46.55%> (-0.03%) ⬇️
3.8 postgresql 54.02% <46.55%> (-0.04%) ⬇️
3.8 snowflake 47.61% <87.35%> (+0.14%) ⬆️
3.8 spark 57.25% <46.55%> (-0.06%) ⬇️
3.8 trino 51.91% <46.55%> (-0.04%) ⬇️
3.9 65.79% <81.03%> (+0.04%) ⬆️
3.9 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.9 aws_deps ?
3.9 big ?
3.9 databricks ?
3.9 filesystem ?
3.9 mssql ?
3.9 mysql ?
3.9 postgresql ?
3.9 snowflake ?
3.9 spark ?
3.9 trino ?
cloud 0.00% <0.00%> (ø)
docs-basic 48.64% <46.55%> (-0.01%) ⬇️
docs-creds-needed 49.78% <64.94%> (+0.06%) ⬆️
docs-spark 48.50% <46.55%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Kilo59 Kilo59 self-assigned this Jun 5, 2024
@Kilo59 Kilo59 marked this pull request as ready for review June 5, 2024 23:35
@Kilo59 Kilo59 changed the title [FEATURE] SnowflakeDatasource required fields update [FEATURE] SnowflakeDatasource update Jun 5, 2024
@Kilo59 Kilo59 enabled auto-merge June 7, 2024 16:50
@Kilo59 Kilo59 requested a review from tyler-hoffman June 7, 2024 16:50
Comment on lines +144 to +145
if template_str: # may have already been set in __new__
self.template_str: str = template_str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused by this. Under what circumstances would self.template_str not be set by the time we get here?

Copy link
Contributor Author

@Kilo59 Kilo59 Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this confused me too.

When the ConfigUri is created using parse_object_as() or as part of standard pydantic model validation.

ConfigUri.__new__ is called (sometimes without the full URL/template_str). It then builds the full URL. But we need that full url to be set as the template_str attribute.
https://github.com/pydantic/pydantic/blob/8333bd59dd3424811f4e73cdd6e9414fd15bb6d7/pydantic/v1/networks.py#L183-L185

Config.__init__ can be called with 2 ways and we need to account for both.
We do this by overriding AnyUrl.__new__ + ConfigStr.__init__

  1. With URL part keywords, without template_str
  2. With template_str, without URL part keywords

Without this added logic the tests fail.

Copy link
Member

@roblim roblim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few comments.

great_expectations/datasource/fluent/interfaces.py Outdated Show resolved Hide resolved
Comment on lines +54 to +56
"minLength": 1,
"maxLength": 65536,
"format": "uri"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a duplicate of the following item?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One represents the ConfigUri type and the other a standard SnowflakeDsn connection string.
I'll update this in a followup.
Either to distinguish these two items in the generated Schema, or remove the duplicate.

Copy link
Contributor Author

@Kilo59 Kilo59 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



@functools.lru_cache(maxsize=4)
def _extract_path_sections(url_path: str) -> dict[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this name more descriptive, since it's doing something more specific? Like _extract_database_and_schema_path_sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this as an immediate followup.

@Kilo59 Kilo59 added this pull request to the merge queue Jun 7, 2024
@Kilo59 Kilo59 removed this pull request from the merge queue due to a manual request Jun 7, 2024
@Kilo59 Kilo59 added this pull request to the merge queue Jun 10, 2024
Merged via the queue into develop with commit 297c4cd Jun 10, 2024
68 checks passed
@Kilo59 Kilo59 deleted the f/lak-975/sf-require-fields branch June 10, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants