Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soft_fail "IntegrityError duplicate entry" loads? #45

Open
7yl4r opened this issue Apr 29, 2018 · 0 comments
Open

soft_fail "IntegrityError duplicate entry" loads? #45

7yl4r opened this issue Apr 29, 2018 · 0 comments

Comments

@7yl4r
Copy link
Member

7yl4r commented Apr 29, 2018

imars-etl powered dags can fail if trying to load a product that already exists in the database. Is this really what we want though? It might make more sense if the operator got marked as "skipped".

This might be possible using AirflowSkipException, but I am not sure. This also raises the question of "do we want to actually overwrite the file" when this happens. This is a complexity perhaps best handled by added options to the imars-etl tool. Some possible usage scenarios:

  1. overwrite the file; I don't care about the old one.
  2. overwrite the file, but maybe keep the old version somewhere too.
  3. don't overwrite the file; I don't know what I am doing.

Advanced checking could involve comparing hashes on the files to raise error only if the files differ. That functionality would probably solve most of the reasons I am seeing this right now, actually.

A rough plan to start:

  1. implement (2)
    1. add new status for overwritten versions... or maybe even a new table? oof.
    2. mv overwritten files to holding tank with a cronjob running periodic cleanups
  2. add hash-checking
    1. add hash column to table (this may come in handy later anyway)
    2. check hash on duplicate, err if differ, only warn if same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant