-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/hive): Add lineage functionality for hive tables from/to file storage #11841
feat(ingestion/hive): Add lineage functionality for hive tables from/to file storage #11841
Conversation
@deepgarg-visa this adds the underlying file system files as lineage below the Hive table, e.g. S3, ABS, HDFS, etc. giving the option of this being upstream or downstream lineage. This mirrors the glue_s3_lineage_direction and emit_s3_lineage in the AWS Glue connector here. This is disabled by default, as with the Glue connector. |
Reopening as this PR as it was closed in error |
default=False, | ||
description="Whether to emit storage-to-Hive lineage", | ||
) | ||
hive_storage_lineage_direction: str = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't storage always upstream of the hive dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a sibling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be to have parity with the glue_s3_lineage_direction
parameter in the Glue config, so this was added to ensure that there is consistency.
If it should be removed, we might want to look at the Glue connector also.
Checklist