-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise/warn on incomplete columns in normalize #1504
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
I'm wondering if we should not do this in the extraction step already. All columns that are non-nullable (and merge and primary keys should be that) should raise if not populated. Extraction spends time on I/O mostly and not on python code as the normalizer, so the check would not make a big difference in performance in my opinion. |
I agree it would be much better to fail early if possible. Ideally we could tell right after the first data item is extracted. |
@@ -989,3 +989,24 @@ def r(): | |||
with pytest.raises(PipelineStepFailed) as pip_ex: | |||
p.run(r()) | |||
assert isinstance(pip_ex.value.__context__, SchemaException) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a test (or check if one exists) to see what happens when we do a merge on merge keys but some rows have null in the merge key? It's not super important right now, but if it would be interesting to know what happens :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a test so I added one. This was raising an exception already through schema.coerce_row
in normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, small requests
d16217f
to
a855f32
Compare
Was looking into if this was possible also, but I don't think so without moving a lot of normalize logic into extract. I wasn't sure how much schema inferrence is done in extract, seems there is none. |
1db75e1
to
0d0afa5
Compare
@steinitzu this needs merge from devel. we did a lot of updates. @sh-rp otherwise this PR is good to go? |
Raise on not-nullable columns to catch e.g. misspelled merge/primary key key
7f36f97
to
c1e2c85
Compare
branch is up to date now and tests passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Turns the "unbound column" warning into an exception for not-null columns and move it to normalize
Related Issues
Additional Context