Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance of the trino query in the Paimon format is poor #85

Open
lmzhang1 opened this issue Sep 27, 2024 · 0 comments
Open

The performance of the trino query in the Paimon format is poor #85

lmzhang1 opened this issue Sep 27, 2024 · 0 comments

Comments

@lmzhang1
Copy link

I did a tpc-ds performance test on trino, the data has two formats:parquet and paimon, the parquet format query takes 3 seconds, the paimon format query takes 60 seconds, and the query sql:
select
i_item_id
,i_item_desc
,s_store_id
,s_store_name
,sum(ss_net_profit) as store_sales_profit
,sum(sr_net_loss) as store_returns_loss
,sum(cs_net_profit) as catalog_sales_profit
from
store_sales
,store_returns
,catalog_sales
,date_dim d1
,date_dim d2
,date_dim d3
,store
,item
where
d1.d_moy = 4
and d1.d_year = 1998
and d1.d_date_sk = ss_sold_date_sk
and i_item_sk = ss_item_sk
and s_store_sk = ss_store_sk
and ss_customer_sk = sr_customer_sk
and ss_item_sk = sr_item_sk
and ss_ticket_number = sr_ticket_number
and sr_returned_date_sk = d2.d_date_sk
and d2.d_moy between 4 and 10
and d2.d_year = 1998
and sr_customer_sk = cs_bill_customer_sk
and sr_item_sk = cs_item_sk
and cs_sold_date_sk = d3.d_date_sk
and d3.d_moy between 4 and 10
and d3.d_year = 1998
group by
i_item_id
,i_item_desc
,s_store_id
,s_store_name
order by
i_item_id
,i_item_desc
,s_store_id
,s_store_name
limit 100;

store_sales table contains 100 GB of data, the parquet format query reads only the data that meets the partition conditions, and the paimon format reads all the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant