-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion): Adding config option to auto lowercase dataset urns #8928
feat(ingestion): Adding config option to auto lowercase dataset urns #8928
Conversation
@@ -57,6 +57,11 @@ class FlagsConfig(ConfigModel): | |||
), | |||
) | |||
|
|||
auto_lowercase_urns: bool = Field( | |||
default=False, | |||
description="Wether to lowercase dataset entity urns.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Wether
-> whether
metadata-ingestion/src/datahub/ingestion/run/pipeline_config.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/run/pipeline_config.py
Outdated
Show resolved
Hide resolved
urn = Urn.create_from_string(wu.get_urn()) | ||
if urn.get_type() == DatasetUrn.ENTITY_TYPE: | ||
dataset_urn = DatasetUrn.create_from_string(str(urn)) | ||
lowercased_urn = DatasetUrn.create_from_ids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only fixes the urn, but won't fix lineage edges or things
Let's use the lowercase_dataset_urns
helper method instead
def lowercase_dataset_urns(model: DictWrapper) -> None: |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of lowercase_dataset_urns looks good, but we do want to move the config to be per-source
yield wu | ||
except Exception: | ||
logger.warning( | ||
f"Failed to lowercase urns for {wu} the exception was: {traceback.format_exc()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"Failed to lowercase urns for {wu} the exception was: {traceback.format_exc()}" | |
f"Failed to lowercase urns for {wu}: {e}", exc_info=True |
@@ -57,6 +57,11 @@ class FlagsConfig(ConfigModel): | |||
), | |||
) | |||
|
|||
auto_lowercase_urns: bool = Field( | |||
default=False, | |||
description="Whether to lowercase dataset entity urns.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
@@ -32,7 +36,7 @@ def list_urns_with_path( | |||
|
|||
if isinstance(model, MetadataChangeProposalWrapper): | |||
if model.entityUrn: | |||
urns.append((model.entityUrn, ["urn"])) | |||
urns.append((model.entityUrn, ["entityUrn"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
class LowerCaseDatasetUrnConfigMixin(ConfigModel): | ||
convert_urns_to_lowercase: bool = Field( | ||
default=False, | ||
description="Whether to convert dataset urns to lowercase.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: This property isn't specific to just dataset urns, is it?
We should ideally have a consistent property across all sources that lower cases any urns it generates or references to ensure we have clean lineage across all sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, it is only lowercase dataset urns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a design decision? What is the reasoning behind it?
@treff7es looks like CI is failing here |
shoot, sorry, I fixed it and now it is green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is green, but there's a merge conflict now :(
and self.ctx.pipeline_config.source.config | ||
and hasattr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I'm looking at this again, it's actually kinda tricky - pipeline_config.source.config
could be either the pydantic config object or the raw config dict - we need to handle both cases here, so we might need to restore that .get(...)
logic you had earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now, I handle both case
Checklist