-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts and AWS results for perf section of super command doc #5506
Conversation
measurements among SuperDB, | ||
[DuckDB](https://duckdb.org/), | ||
[ClickHouse](https://clickhouse.com/), and | ||
[DataFusion](https://datafusion.apache.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to stick to the backtick-enclosed fixed width font when showing things as typed by a user. We reference all these by their command names further down, but I figured where we're hyperlinking to them up top we can use their real names here.
As of this writing in November 2024, we're using the latest version 1.1.3 of `duckdb`. | ||
version 24.11.1.1393 of `clickhouse`, and v43.0.0 of `datafusion-cli`. | ||
The detailed steps shown [below](#appendix-2-running-the-tests) can be reproduced via | ||
[automated scripts](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.
FROM 'gha.parquet' | ||
) | ||
``` | ||
and for ClickHouse, we had to use `arrayJoin` instead of `unnest`. | ||
|
||
SuperSQL's data model does not require these sorts of gymnastics as | ||
SuperSQL's data model does not require these kinds of gymnastics as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "sort" appears so often for technical reasons, I figured I'd opt for a synonym.
start the run: | ||
|
||
``` | ||
curl -s https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf/benchmark.sh | bash -xv 2>&1 | tee runlog.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL referenced in this curl
command line won't work until this PR merges.
git clone https://github.com/brimdata/super.git | ||
cd scripts/super-cmd-perf | ||
./benchmark.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commands are forward-looking because these scripts won't exist on main
until this PR merges.
@@ -774,25 +787,32 @@ WHERE id LIKE '%in case you have any feedback 😊%' | |||
OR payload.member.type LIKE '%in case you have any feedback 😊%' | |||
``` | |||
There are 486 such fields. You can review the entire query in | |||
[docs/commands/search.sql](search.sql). | |||
[`search+.sql`](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf/search%2B.sql). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.
This appendix provides the raw tests and output that we run on a MacBook Pro to generate | ||
the table of results above. | ||
This appendix provides the raw tests and output from the [most recent archived run](https://super-cmd-perf.s3.us-east-2.amazonaws.com/2024-11-26_03-17-25.tgz) | ||
of the tests via [automated scripts](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adjust that sentence about clickhouse as we discussed...
What's Changing
A set of scripts are introduced to automate the running of the same perf queries shown in #5481. I've updated the super command doc to reflect results from these scripts run on an AWS
m6idn.2xlarge
instance.For ease of review, I've published a copy of the docs from this branch in a personal staging site here.
Why
Details
@mccanne established a set of five queries in #5481 that illustrate perf differences between SuperDB, DuckDB, ClickHouse, and DataFusion. The results shown as of that PR were from manual runs on @mccanne's Macbook. The scripts introduced here can be run on an AWS reference instance as well as Macbooks.
Because the scripts only exist on this PR's branch but we want to hyperlink to them from the docs, there are expected link checker failures at the moment that will go away once this PR merges. I've placed inline comments in the PR to the expected failures.
Regarding the style in which the shell scripts are written, I opt for a style that adheres to most shellcheck conventions. This may be somewhat more verbose than other styles, but it's consistent and it provides a handy lint-like way to catch mistakes as they're being modified.