Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abort load package and raise exception on terminal errors in jobs #1749

Closed
rudolfix opened this issue Aug 27, 2024 · 1 comment · Fixed by #1781
Closed

abort load package and raise exception on terminal errors in jobs #1749

rudolfix opened this issue Aug 27, 2024 · 1 comment · Fixed by #1781
Assignees
Labels
breaking This issue introduces breaking change sprint Marks group of tasks with core team focus at this moment

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Aug 27, 2024

Background
by default dlt will move terminally failed jobs into failed_jobs folder and will not raise any exceptions. users can change load.raise_on_failed_jobs=true config option to abort the package and raise exception afterwards.

this PR makes this behavior default.

background: https://dlthub.com/docs/running-in-production/running#handle-exceptions-failed-jobs-and-retry-the-pipeline

Requirements
PR 1:

  1. Switch raise_on_failed_jobs to true
  2. Update run in production docs
  3. Update failing tests. There may be a lot of them. Add unit tests to test_dummy_client (I hope we test this config flag somewhere in it)

PR 2:
4. Add a new cli/pipeline method to retry aborted package (moves failed jobs back to new). extend this command:
https://dlthub.com/docs/reference/command-line-interface#get-the-load-package-information
and add custom parser with "retry-aborted". which:

  • will remove "aborted" flag and move package from "completed" to "normalized"
  • move all failed jobs back to new jobs and delete failed messages.
@willi-mueller
Copy link
Collaborator

willi-mueller commented Sep 9, 2024

@rudolfix
The only downside of this change might be that the default strategy for the write_disposition = "replace" would truncate the table and not add any data because the load package failed.

Could we change the default strategy to ' insert-from-staging ' to minimize the chances of users deleting their data on error?

Background: https://dlthub.com/docs/general-usage/full-loading#the-truncate-and-insert-strategy

@github-project-automation github-project-automation bot moved this from In Progress to Done in dlt core library Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This issue introduces breaking change sprint Marks group of tasks with core team focus at this moment
Projects
Status: Done
2 participants