polars-cli vs Qsv sqlp #1620
-
I have been experimenting with polars-cli https://github.com/pola-rs/polars-cli as compared to sqlp. The only drawback is that polars-cli does not accept stdin. The one that strikes me is a querry I was doing on NYyellowtaxi file of more than 7million rows. There was a difference in performance between the two.
vs
Any suggestions why the difference |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 8 replies
-
Can you post the source link for the file?
|
Beta Was this translation helpful? Give feedback.
-
If I am bored enough tomorrow (public holiday) I’ll play with it :-)-O
thanks
|
Beta Was this translation helpful? Give feedback.
-
CSVQ takes about 2 minutes :-(-O |
Beta Was this translation helpful? Give feedback.
-
Some hyperfine reports:
|
Beta Was this translation helpful? Give feedback.
-
@13minutes-yt , this is because qsv loads and parses the CSV separately before executing the SQL and this is not exactly an apples-to-apples comparison as the queries are different and you're also bypassing the CSV parsing for the polars-cli. However, if we also use
|
Beta Was this translation helpful? Give feedback.
-
Now on Silicon
and then hyperfine tests: Benchmark 1: polars -o csv "select VendorID,sum(total_amount) from read_csv(taxi.csv) group by VendorID order by VendorID " Benchmark 1: qsv sqlp -Q smalldummy.csv "select VendorID,sum(total_amount) from read_csv(taxi.csv) group by VendorID order by VendorID" Benchmark 1: qsv sqlp -Q taxi.csv "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID" Benchmark 1: duckdb -c "select VendorID,sum(total_amount) from read_csv(taxi.csv) group by VendorID order by VendorID" I tried an empty |
Beta Was this translation helpful? Give feedback.
-
It is good that the discrepancy has been sorted out with the dummy-file workaround.thanks @jqnatividad for your help and enthusiasm for this project.QSV is amazing . I am trying to get to the full potential of QSV. Very interesting benchmarks @ondohotola . |
Beta Was this translation helpful? Give feedback.
-
Thank you.
Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
@13minutes-yt , this is because qsv loads and parses the CSV separately before executing the SQL and this is not exactly an apples-to-apples comparison as the queries are different and you're also bypassing the CSV parsing for the polars-cli.
However, if we also use
read_csv
directly in the qsv SQL query similar to the polars-cli query, and just pass a small, dummy csv as input, we get similar performance as sqlp also leverages the magic of Polars LazyFrames in reading only what it needs from the CSV to fulfill the query, i.e.