-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce schema for partial tables in multi-source multi-batch JSON reader #17708
base: branch-25.02
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
/ok to test |
/ok to test |
std::string json_string_b1 = R"( | ||
{ "a": { "y" : 6}, "b" : [1, 2, 3], "c": 11 } | ||
{ "a": { "y" : 6}, "b" : [4, 5 ], "c": 12 } | ||
{ "a": { "y" : 6}, "b" : [6 ], "c": 13 } | ||
{ "a": { "y" : 6}, "b" : [7 ], "c": 14 })"; | ||
std::string json_string_b2 = R"( | ||
{ "a": { "y" : 6}, "c": 11 } | ||
{ "a": { "y" : 6}, "b" : [4, 5 ], "c": 12 } | ||
{ "a": { "y" : 6}, "b" : [6 ], "c": 13 } | ||
{ "a": { "y" : 6}, "b" : [7 ], "c": 14 })"; | ||
std::string json_string_b3 = R"( | ||
{ "a": { "y" : 6}} | ||
{ "a": { "y" : 6}, "b" : [4, 5 ], "c": 12 } | ||
{ "a": { "y" : 6}, "b" : [6 ], "c": 13 } | ||
{ "a": { "y" : 6}, "b" : [7 ], "c": 14 })"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are the schemas different between these three?
I suspect that json_string_b1 is "a","b","c" json_string_b2 is "a","c","b", but I can't tell what json_string_b3 leads to.
A comment on what the test case here is would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column ordering is different across batches - we set the batch size to the size of first string json_string_b1
. And we enforce the same ordering on batches 2 and 3.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just one small style suggestion
Co-authored-by: Vukasin Milovanovic <[email protected]>
/ok to test |
/ok to test |
/ok to test |
Description
Closes #17689
This PR resolves a bug in the multi-batch JSON reader, wherein the reader was throwing an error when the column schema for any two partial tables from different batches did not match. We now enforce the column ordering in the first partial table i.e. the table returned by the first batch in all succeeding batches.
The test added passes three string as three separate batches to the reader by setting the batch size to that of the first string.
Checklist