-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support incremental loading for scd2 #1789
Comments
@gnilrets thanks for feedback! Could you tell us how those merges should work? Currently all primary keys that are not present in the input datasets are retired. How would you limit that? We have |
Yeah, in this case, just don't do hard deletes. |
@jorritsandbrink do you have a good idea on how to implement it in terms of user interface? add |
Not fitting because updates are involved.
Not very intuitive. My suggestion: @dlt.resource(
write_disposition={"disposition": "merge", "strategy": "scd2", "retire_if_absent": False}
)
def my_incremental_dim_data():
... where This is consistent with other |
Yep, @jorritsandbrink 's suggestion is what I was thinking too. |
Feature description
Currently, the scd2 write disposition requires running a full extract of the source data in order to work; in other words, it does not support incremental loads. The main justification for this seems to be that doing a full extract is the only way to detect hard deletes. While true, there are plenty of situations where we would like to perform incremental scd2 loading where hard deletes are not a concern. At the moment, the workaround we have is to do a regular merge incremental load, followed by a dbt snapshots. But that effectively doubles the amount of storage needed in our warehouse and introduces an extra computational step.
Are you a dlt user?
Yes, I run dlt in production.
Use case
We have data in a SaaS product that contains large, mutable data and the vendor enforces rate limiting. Doing daily or more frequent regular full extracts is not feasible or cost effective. Having an incremental scd2 solution (even if it didn't manage hard deletes) would be very valuable to us.
Proposed solution
The scd2 write disposition supports incremental merge, with the caveat that hard deletes are not supported when doing so.
Related issues
No response
The text was updated successfully, but these errors were encountered: