-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materializes table schemas for empty tables #1122
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
schema_updates.append(partial_update) | ||
logger.debug(f"Processed {line_no} lines from file {extracted_items_file}") | ||
if line is None and root_table_name in self.schema.tables: | ||
# write only if table seen data before | ||
# TODO: we should push the truncate jobs via package state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is probably a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I'm assuming there are existing tests in the code that test that no table will be created in the destination for empty resources if this marker is not emitted, right?
PS: What about docs? |
@sh-rp where in the docs we should add this trick? I want to rewrite the Pipeline doc. it is outdated. and there I can mention all the possible markers for items |
@@ -74,7 +74,7 @@ def _filter_columns( | |||
return row | |||
|
|||
def _normalize_chunk( | |||
self, root_table_name: str, items: List[TDataItem], may_have_pua: bool | |||
self, root_table_name: str, items: List[TDataItem], may_have_pua: bool, skip_write: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should skip_write: bool
default to True
?
you wrote this tests when we were adding empty files to replace resources |
Description
fixes #1116
Here we allow to create empty tables / files at the destination. The standard
dlt
behavior is to delay creation of tables until first data arrives. This is beneficial when schema is determined from the data (fully or partially). Specifically yielding empty list will not create empty tables / files. We do not want to change this behavior.In some cases ie. when full schema is known upfront or generated at runtime, empty tables and files should be created without waiting for data.
This PR introduces a new marker in
dlt.mark
that, when yielded, will materialize empty tables. Example (dynamic schema)example (defined schema)