Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter pushdown issue #233

Closed
samansmink opened this issue May 31, 2024 · 2 comments · Fixed by duckdb/duckdb-delta#27
Closed

Filter pushdown issue #233

samansmink opened this issue May 31, 2024 · 2 comments · Fixed by duckdb/duckdb-delta#27

Comments

@samansmink
Copy link

Using duckdb_delta I'm seeing some weird behaviour with filter pushdown.

When running the following query on one of the DAT tables:

SELECT letter, number
FROM delta_scan('${DAT_PATH}/out/reader_tests/generated/basic_append/delta')
WHERE number < 2

This results in DuckDB trying to push down the Predicate number < 2 AND number is not null.

Currently in the main branch of DuckDB delta, we will not push down filters that are not completely supported by kernel. So this filter is not pushed down at all.

However, to me it seems like that even though IS NOT NULL is not yet supported, we should still be able to do file skipping here because kernel could handle the children of the conjunction AND that is does support while ignoring the children that are not yet supported.

In the code from the DuckDB delta demo, this code was already there so I kindof assumed it to be working, but maybe thats not the case.

However when I try this do this, i get incorrect results. So what I try is call ffi::visit_expression_and with an iterator that will iterate over the childeren. When DuckDB is then called by kernel to handle the IS NOT NULL filter, we return ~0 to the kernel to indicate that this filter type is not yet supported.

However when I do that, the above query actually returns an incorrect result: All files are now skipped, which is clearly wrong because:

FROM delta_scan('${DAT_PATH}/out/reader_tests/generated/basic_append/delta')

returns

4
5
1
2
3

I'm not really sure if I'm misunderstanding the expression visitor API or there's something actually wrong here. Like returning ~0 seems to be the convention to indicate an unsupported filter, but it seems to not always be working.

TLDR; not sure if there's a bug in kernel, but I think we need some clarity on how to use the expression vistors when parts of an AND filter are unsupported @nicklan

@nicklan
Copy link
Collaborator

nicklan commented May 31, 2024

This is fixed by duckdb/duckdb-delta#27

Closing this, as it's not actually an issue in kernel.

I have opened #234 to see if we can be more efficient here.

@nicklan nicklan closed this as completed May 31, 2024
@nicklan
Copy link
Collaborator

nicklan commented May 31, 2024

Also, just to answer your question, returning ~0 is the right thing to do when an expression isn't supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants