-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early version of profiler harness #959
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #959 +/- ##
=======================================
Coverage 76.94% 76.94%
=======================================
Files 72 72
Lines 5691 5691
=======================================
Hits 4379 4379
Misses 1312 1312
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
.github/workflows/profiler.yml
Outdated
@@ -0,0 +1,30 @@ | |||
name: Performance check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would align this name with the workflow file name
.github/workflows/profiler.yml
Outdated
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
if you just want the HEAD commit.
tools/perf_checker/benchmark1.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is is testing "export anndata", so why not name the file as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe put all the benchmarks into a subdir. The perf_checker.sh
script could just iterate through all the files, allowing the explicit definition (and future update) of the benchmarks
array in the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Downloading TileDB-SOMA (branch or main once merged) | ||
git clone https://github.com/single-cell-data/TileDB-SOMA.git | ||
cd TileDB-SOMA | ||
git checkout census_profiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be removed before merging to main, right? (I think that's what the comment above is saying)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not removed and is now present in main
, and has broken the CI
tools/perf_checker/perf_checker.sh
Outdated
|
||
# Running all benchmark and checking performance changes | ||
arraylength=${#benchmarks[@]} | ||
for (( i=0; i<${arraylength}; i++ )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably just iterate the array, like for benchmark in ${benchmarks}
and avoid the indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, per another comment, if all benchmark scripts are placed in a subdir, you can just glob and iterate them: for benchmark in benchmarks/*py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right
tools/perf_checker/perf_checker.sh
Outdated
arraylength=${#benchmarks[@]} | ||
for (( i=0; i<${arraylength}; i++ )) | ||
do | ||
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time | |
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -t time
going to run gnu-time
? the profiler requires gnu-time
-formatted output. would install gnu-time
if needed, in profiler.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll fix it need to figure how to enable it on linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually noticed that gtime only works on macos and was not able to install on ubunutu. LMK if there's a way to get it to work on ubuntu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually available under /usr/bin/time
. If not try, try sudo apt install time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tools/perf_checker/perf_checker.sh
Outdated
mount-s3 census-profiler-tests ./mount-s3 --cache ./s3_cache --metadata-ttl 300 | ||
|
||
dbpath=`pwd`/mount-s3 | ||
echo "Mount-S path = ${dbpath}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mount-s3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo not needed if using set -x
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mount-s3 is an s3 fused drive
tools/perf_checker/perf_checker.sh
Outdated
sudo apt install -y ./mount-s3.deb | ||
|
||
# Setting up mount-s3 | ||
mkdir ./mount-s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rename to indicate its contents instead? could be same as bucket name: census-profiler-tests
. Would also add comment explaining that this is necessary to persist the profiling run data that are performed below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great ideas both
tools/perf_checker/benchmark1.py
Outdated
) as query: | ||
query.to_anndata(X_name="raw") | ||
t2 = perf_counter() | ||
print(f"End to end time {t2 - t1}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is timing necessary here, since this is being called by the profiler, which is already timing this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
c3c5ae1
to
6f765ef
Compare
ed3e36d
to
1d9701f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Left a few nitpicks.
.github/workflows/profiler.yml
Outdated
uses: aws-actions/configure-aws-credentials@v1 | ||
with: | ||
aws-region: us-west-2 | ||
role-to-assume: arn:aws:iam::401986845158:role/MyNewPlayground #arn:aws:iam::401986845158:role/PlaygroundS3 # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidate/remove comments
tools/perf_checker/perf_checker.sh
Outdated
python3.11 -m venv ~/venv | ||
. ~/venv/bin/activate | ||
|
||
pip install psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can go on a single line
import tiledbsoma as soma | ||
|
||
print("Starting bm 1", file=stderr) | ||
census_S3_latest = dict(census_version="2024-01-01") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pin a non LTS version it will go away after one month. That said, this is a test file so it's probably fine, but leave a comment for posterity.
a85038a
to
b47fd35
Compare
Include a basic benchmark as the starting point and needed scripts
Include a basic benchmark as the starting point and needed scripts