Rework "update freshness" to not error with library -> ingest #1339
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are going to have a failing run this weekend without these changes.
#1097 added specific logic on how to handle an
ingest
run when we've already archived that version of a dataset. However, it didn't account for when the previous archived "vintage" of that version had come fromlibrary
. Given that we're in the middle of this migration, and there are some slight data changes occuring (that would cause the validation from #1097 to error), I decided to clean up the code a little and simply "pass" - no overwrite, no updating timestamps - when the existing version came fromlibrary
Commits are quite atomic. I moved around some testing code hence the several commits, it should make commit-by-commit review easier. I thought pulling out the logic of "validating" (@sf-dcp 's favorite word) the new dataset versus existing versions to a function which makes no changes, and then based on the enum returned
run
decides what to do.Integration Tests
Running in my dev bucket, first run a job via library by using a branch of mine without latest main, so opendata job still uses library. This should get us latest versions of datasets, archived by library. I cancelled it as to save runtime. There are also the DOT ones which went private and failed.
Then, run a job on main with my dev bucket. See failures (except for the ones that weren't actually archived in the previous step). There are also some internal server errors happening? 500 errors. It seems like we might be getting rate limited for real. Anyways, see this job here to see "successful" failure
Then, run a job on this branch with my dev bucket. Getting one bizarre s3 error that I'm not going to try to troubleshoot this second