Shouldn't stop just because a step returned no files #27
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was fun :). This fixes delta-io/delta-kernel-rs#233
Basically, if you push down a predicate, you can have a situation where a batch of files that does include an
Add
file, doesn't actually return any files to scan, because they are all filtered out. The kernel can't know this for sure because we don't introspect the data until the engine asks us to extract it for them. So in the case of running:the first batch included one file, but it's filtered out by the predicate, so nothing actually came out and
resolved_files.size() == size_before
would be true, soduckdb
would just stop looking for more files. But there is one more file to scan, the one with the data we want! :)The simple fix is to keep iterating until the kernel tells you you can be sure there's no more data.
There's a chance the kernel could optimize more and not have returned the first batch, but in general I think engines should assume they should keep iterating until
scan_data_next
returnsfalse