Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: tpc-DS q23 fails with schema mismatch error #3561

Open
universalmind303 opened this issue Dec 12, 2024 · 0 comments
Open

SQL: tpc-DS q23 fails with schema mismatch error #3561

universalmind303 opened this issue Dec 12, 2024 · 0 comments
Labels
bug Something isn't working sql

Comments

@universalmind303
Copy link
Contributor

Describe the bug

---------------------------------------------------------------------------
DaftCoreException                         Traceback (most recent call last)
Cell In[6], line 2
      1 # %%
----> 2 run_query(23)
Cell In[1], line 42, in run_query(query)
     40 def run_query(query: int):
     41     q = open(f"/{TPCDS_QUERIES_PATH}/{str(query).zfill(2)}.sql").read()
---> 42     return daft.sql(q, catalog=catalog).to_arrow()
File ~/Development/Daft/daft/api_annotations.py:26, in DataframePublicAPI.<locals>._wrap(*args, **kwargs)
     24 type_check_function(func, *args, **kwargs)
     25 timed_method = time_df_method(func)
---> 26 return timed_method(*args, **kwargs)
File ~/Development/Daft/daft/analytics.py:199, in time_df_method.<locals>.tracked_method(*args, **kwargs)
    197 start = time.time()
    198 try:
--> 199     result = method(*args, **kwargs)
    200 except Exception as e:
    201     _ANALYTICS_CLIENT.track_df_method_call(
    202         method_name=method.__name__, duration_seconds=time.time() - start, error=str(type(e).__name__)
    203     )
File ~/Development/Daft/daft/dataframe/dataframe.py:2726, in DataFrame.to_arrow(self)
   2723 import pyarrow as pa
   2725 arrow_rb_iter = self.to_arrow_iter(results_buffer_size=None)
-> 2726 return pa.Table.from_batches(arrow_rb_iter, schema=self.schema().to_pyarrow_schema())
File ~/Development/Daft/.venv/lib/python3.11/site-packages/pyarrow/table.pxi:4755, in pyarrow.lib.Table.from_batches()
File ~/Development/Daft/daft/dataframe/dataframe.py:352, in DataFrame.to_arrow_iter(self, results_buffer_size)
    347 partitions_iter = context.get_or_create_runner().run_iter_tables(
    348     self._builder, results_buffer_size=results_buffer_size
    349 )
    351 # Iterate through partitions.
--> 352 for partition in partitions_iter:
    353     yield from partition.to_arrow().to_batches()
File ~/Development/Daft/daft/runners/native_runner.py:89, in NativeRunner.run_iter_tables(self, builder, results_buffer_size)
     86 def run_iter_tables(
     87     self, builder: LogicalPlanBuilder, results_buffer_size: int | None = None
     88 ) -> Iterator[MicroPartition]:
---> 89     for result in self.run_iter(builder, results_buffer_size=results_buffer_size):
     90         yield result.partition()
File ~/Development/Daft/daft/runners/native_runner.py:84, in NativeRunner.run_iter(self, builder, results_buffer_size)
     78 executor = NativeExecutor.from_logical_plan_builder(builder)
     79 results_gen = executor.run(
     80     {k: v.values() for k, v in self._part_set_cache.get_all_partition_sets().items()},
     81     daft_execution_config,
     82     results_buffer_size,
     83 )
---> 84 yield from results_gen
File ~/Development/Daft/daft/execution/native_executor.py:40, in <genexpr>(.0)
     35 from daft.runners.partitioning import LocalMaterializedResult
     37 psets_mp = {
     38     part_id: [part.micropartition()._micropartition for part in parts] for part_id, parts in psets.items()
     39 }
---> 40 return (
     41     LocalMaterializedResult(MicroPartition._from_pymicropartition(part))
     42     for part in self._executor.run(psets_mp, daft_execution_config, results_buffer_size)
     43 )
DaftCoreException: DaftError::External task 33741 panicked with message "Loaded MicroPartition's tables' schema must match its o
wn schema exactly"

To Reproduce

  1. generate tpc-ds dataset using duckdb.
  2. run tpcds q23 using daft sql

Expected behavior

No response

Component(s)

SQL

Additional context

No response

@universalmind303 universalmind303 added bug Something isn't working needs triage sql and removed needs triage labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sql
Projects
None yet
Development

No branches or pull requests

1 participant