Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on multiple tables using DuckDB #827

Closed
ArijitSinghEDA opened this issue Jun 18, 2024 · 3 comments
Closed

Query on multiple tables using DuckDB #827

ArijitSinghEDA opened this issue Jun 18, 2024 · 3 comments

Comments

@ArijitSinghEDA
Copy link

Feature Request / Improvement

A way to load and scan multiple tables (across different namespaces), and use DuckDB query to perform select and join operations (for conditional join)

@kevinjqliu
Copy link
Contributor

Hi @ArijitSinghEDA could you provide an example?

@ArijitSinghEDA
Copy link
Author

Suppose we have a table table1 in namespace ns1 with columns employee_id, employee_name, department_id, and we have a table table2 in namespace ns2 with columns department_id, department_name. What I am thinking of is something like this:

from pyiceberg.catalog import load_catalog
catalog = load_catalog("example")
tables = catalog.load_tables(("ns1", "table1"), ("ns2", "table2"))
con = tables.scan().to_duck_db()
df = con.execute(
    """"""select * from ns1.table1 join n2.table2 using (department_id)""""""
).fetchdf()
print(df)

@mike-luabase
Copy link

@ArijitSinghEDA you can use the iceberg extension in duckdb

It support multiple tables, but you'll need to follow the tips here duckdb/duckdb-iceberg#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants