Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materializes table schemas for empty tables #1122

Merged
merged 5 commits into from
Mar 21, 2024

Conversation

rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Mar 21, 2024

Description

fixes #1116

Here we allow to create empty tables / files at the destination. The standard dlt behavior is to delay creation of tables until first data arrives. This is beneficial when schema is determined from the data (fully or partially). Specifically yielding empty list will not create empty tables / files. We do not want to change this behavior.

In some cases ie. when full schema is known upfront or generated at runtime, empty tables and files should be created without waiting for data.

This PR introduces a new marker in dlt.mark that, when yielded, will materialize empty tables. Example (dynamic schema)

    @dlt.resource
    def users():
        yield dlt.mark.with_hints(
            # this is a special empty item which will materialize table schema
            dlt.mark.materialize_table_schema(),
            # emit table schema with the item
            dlt.mark.make_hints(
                columns=[
                    {"name": "id", "data_type": "bigint", "precision": 4, "nullable": False},
                    {"name": "name", "data_type": "text", "nullable": False},
                ]
            ),
        )

example (defined schema)

    @dlt.resource(columns=UsersPydanticModel)
    def users():
        yield dlt.mark.materialize_table_schema()

Copy link

netlify bot commented Mar 21, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 88f0029
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/65fb802bf16b7a0008eba98b
😎 Deploy Preview https://deploy-preview-1122--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix self-assigned this Mar 21, 2024
@rudolfix rudolfix requested a review from sh-rp March 21, 2024 14:16
schema_updates.append(partial_update)
logger.debug(f"Processed {line_no} lines from file {extracted_items_file}")
if line is None and root_table_name in self.schema.tables:
# write only if table seen data before
# TODO: we should push the truncate jobs via package state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is probably a good idea

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'm assuming there are existing tests in the code that test that no table will be created in the destination for empty resources if this marker is not emitted, right?

@sh-rp
Copy link
Collaborator

sh-rp commented Mar 21, 2024

PS: What about docs?

@rudolfix
Copy link
Collaborator Author

@sh-rp where in the docs we should add this trick? I want to rewrite the Pipeline doc. it is outdated. and there I can mention all the possible markers for items

@@ -74,7 +74,7 @@ def _filter_columns(
return row

def _normalize_chunk(
self, root_table_name: str, items: List[TDataItem], may_have_pua: bool
self, root_table_name: str, items: List[TDataItem], may_have_pua: bool, skip_write: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should skip_write: bool default to True?

@rudolfix
Copy link
Collaborator Author

Looks good, I'm assuming there are existing tests in the code that test that no table will be created in the destination for empty resources if this marker is not emitted, right?

you wrote this tests when we were adding empty files to replace resources

@rudolfix rudolfix merged commit 92bf3a0 into devel Mar 21, 2024
43 of 53 checks passed
@rudolfix rudolfix deleted the rfix/materializes-tables-without-data branch March 21, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate data with just table structure in the destination if there was nothing in the extract phase.
3 participants