Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/snowflake): always ingest view and external table ddl lineage #12191

Merged
merged 8 commits into from
Dec 24, 2024

Conversation

mayurinehate
Copy link
Collaborator

@mayurinehate mayurinehate commented Dec 20, 2024

Earlier disabling table lineage via include_table_lineage mandated disabling view lineage via include_view_lineage which made it impossible to separate schema and lineage/usage ingestions in separate recipes without missing on view lineage entirely.

Stacked on top of #12179

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs labels Dec 20, 2024
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Dec 20, 2024
@mayurinehate mayurinehate enabled auto-merge (squash) December 20, 2024 11:42
Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Minor question in include_table_lineage param description

@@ -163,26 +163,13 @@ class SnowflakeConfig(
default=True,
description="If enabled, populates the snowflake table-to-table and s3-to-snowflake table lineage. Requires appropriate grants given to the role and Snowflake Enterprise Edition or above.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update this include_table_lineage description here to emphasize that this enables view lineage too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The view upstream lineage would be ingested always irrespective of this flag.

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Dec 23, 2024
)

_include_view_lineage = pydantic_removed_field("include_view_lineage")
_include_view_column_lineage = pydantic_removed_field("include_view_column_lineage")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should make this change everywhere

@@ -512,15 +513,14 @@ def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]:
discovered_datasets = discovered_tables + discovered_views

if self.config.use_queries_v2:
self.report.set_ingestion_stage("*", "View Parsing")
assert self.aggregator is not None
self.report.set_ingestion_stage("*", VIEW_PARSING)
yield from auto_workunit(self.aggregator.gen_metadata())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly for my own understanding - self.aggregator is only used for view parsing? and then there's another aggregator within SnowflakeQueriesExtractor?

if so, should the self.aggregator.gen_metadata() call happen first and then lineage generation happen afterwards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly for my own understanding - self.aggregator is only used for view parsing? and then there's another aggregator within SnowflakeQueriesExtractor?

Yes, when use_queries_v2 is true

When use_queries_v2 is false:

  • and include_table_lineage is false -> then self.aggregator is used for only view lineage
  • and include_table_lineage is true -> then self.aggregator is used for both view and table lineage.

@mayurinehate mayurinehate merged commit 4d990b0 into datahub-project:master Dec 24, 2024
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants