Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts and AWS results for perf section of super command doc #5506

Merged
merged 6 commits into from
Nov 27, 2024

Conversation

philrz
Copy link
Contributor

@philrz philrz commented Nov 26, 2024

What's Changing

A set of scripts are introduced to automate the running of the same perf queries shown in #5481. I've updated the super command doc to reflect results from these scripts run on an AWS m6idn.2xlarge instance.

For ease of review, I've published a copy of the docs from this branch in a personal staging site here.

Why

  1. Automated scripts will make it easier to re-run/publish results over time as devices-under-test evolve.
  2. AWS represents "neutral turf" on which anyone can reproduce the results.

Details

@mccanne established a set of five queries in #5481 that illustrate perf differences between SuperDB, DuckDB, ClickHouse, and DataFusion. The results shown as of that PR were from manual runs on @mccanne's Macbook. The scripts introduced here can be run on an AWS reference instance as well as Macbooks.

Because the scripts only exist on this PR's branch but we want to hyperlink to them from the docs, there are expected link checker failures at the moment that will go away once this PR merges. I've placed inline comments in the PR to the expected failures.

Regarding the style in which the shell scripts are written, I opt for a style that adheres to most shellcheck conventions. This may be somewhat more verbose than other styles, but it's consistent and it provides a handy lint-like way to catch mistakes as they're being modified.

@philrz philrz self-assigned this Nov 26, 2024
Comment on lines +662 to +665
measurements among SuperDB,
[DuckDB](https://duckdb.org/),
[ClickHouse](https://clickhouse.com/), and
[DataFusion](https://datafusion.apache.org/).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to stick to the backtick-enclosed fixed width font when showing things as typed by a user. We reference all these by their command names further down, but I figured where we're hyperlinking to them up top we can use their real names here.

As of this writing in November 2024, we're using the latest version 1.1.3 of `duckdb`.
version 24.11.1.1393 of `clickhouse`, and v43.0.0 of `datafusion-cli`.
The detailed steps shown [below](#appendix-2-running-the-tests) can be reproduced via
[automated scripts](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.

FROM 'gha.parquet'
)
```
and for ClickHouse, we had to use `arrayJoin` instead of `unnest`.

SuperSQL's data model does not require these sorts of gymnastics as
SuperSQL's data model does not require these kinds of gymnastics as
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "sort" appears so often for technical reasons, I figured I'd opt for a synonym.

start the run:

```
curl -s https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf/benchmark.sh | bash -xv 2>&1 | tee runlog.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL referenced in this curl command line won't work until this PR merges.

Comment on lines +63 to +65
git clone https://github.com/brimdata/super.git
cd scripts/super-cmd-perf
./benchmark.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands are forward-looking because these scripts won't exist on main until this PR merges.

@@ -774,25 +787,32 @@ WHERE id LIKE '%in case you have any feedback 😊%'
OR payload.member.type LIKE '%in case you have any feedback 😊%'
```
There are 486 such fields. You can review the entire query in
[docs/commands/search.sql](search.sql).
[`search+.sql`](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf/search%2B.sql).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.

This appendix provides the raw tests and output that we run on a MacBook Pro to generate
the table of results above.
This appendix provides the raw tests and output from the [most recent archived run](https://super-cmd-perf.s3.us-east-2.amazonaws.com/2024-11-26_03-17-25.tgz)
of the tests via [automated scripts](https://github.com/brimdata/super/blob/main/scripts/super-cmd-perf)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scripts currently only exist on this branch, this hyperlink will fail until this PR is merged.

@philrz philrz requested a review from a team November 26, 2024 20:24
@philrz philrz marked this pull request as ready for review November 26, 2024 20:24
Copy link
Collaborator

@mccanne mccanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adjust that sentence about clickhouse as we discussed...

@philrz philrz merged commit 8901aa2 into main Nov 27, 2024
3 of 4 checks passed
@philrz philrz deleted the super-cmd-perf branch November 27, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants