Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early version of profiler harness #959

Merged
merged 1 commit into from
Mar 9, 2024
Merged

Early version of profiler harness #959

merged 1 commit into from
Mar 9, 2024

Conversation

beroy
Copy link
Collaborator

@beroy beroy commented Jan 29, 2024

Include a basic benchmark as the starting point and needed scripts

@beroy beroy requested a review from ebezzi January 29, 2024 03:05
@beroy beroy requested a review from atolopko-czi January 29, 2024 03:14
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.94%. Comparing base (4f40690) to head (a85038a).

❗ Current head a85038a differs from pull request most recent head c4b9a26. Consider uploading reports for the commit c4b9a26 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #959   +/-   ##
=======================================
  Coverage   76.94%   76.94%           
=======================================
  Files          72       72           
  Lines        5691     5691           
=======================================
  Hits         4379     4379           
  Misses       1312     1312           
Flag Coverage Δ
unittests 76.94% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -0,0 +1,30 @@
name: Performance check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would align this name with the workflow file name

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 if you just want the HEAD commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is is testing "export anndata", so why not name the file as such?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe put all the benchmarks into a subdir. The perf_checker.sh script could just iterate through all the files, allowing the explicit definition (and future update) of the benchmarks array in the script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Downloading TileDB-SOMA (branch or main once merged)
git clone https://github.com/single-cell-data/TileDB-SOMA.git
cd TileDB-SOMA
git checkout census_profiler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed before merging to main, right? (I think that's what the comment above is saying)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not removed and is now present in main, and has broken the CI


# Running all benchmark and checking performance changes
arraylength=${#benchmarks[@]}
for (( i=0; i<${arraylength}; i++ ))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably just iterate the array, like for benchmark in ${benchmarks} and avoid the indexing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, per another comment, if all benchmark scripts are placed in a subdir, you can just glob and iterate them: for benchmark in benchmarks/*py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right

arraylength=${#benchmarks[@]}
for (( i=0; i<${arraylength}; i++ ))
do
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time
python ./TileDB-SOMA/profiler "python ${benchmarks[$i]}" $dbpath -t time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -t time going to run gnu-time? the profiler requires gnu-time-formatted output. would install gnu-time if needed, in profiler.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll fix it need to figure how to enable it on linux

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually noticed that gtime only works on macos and was not able to install on ubunutu. LMK if there's a way to get it to work on ubuntu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually available under /usr/bin/time. If not try, try sudo apt install time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

mount-s3 census-profiler-tests ./mount-s3 --cache ./s3_cache --metadata-ttl 300

dbpath=`pwd`/mount-s3
echo "Mount-S path = ${dbpath}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mount-s3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo not needed if using set -x above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mount-s3 is an s3 fused drive

sudo apt install -y ./mount-s3.deb

# Setting up mount-s3
mkdir ./mount-s3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rename to indicate its contents instead? could be same as bucket name: census-profiler-tests. Would also add comment explaining that this is necessary to persist the profiling run data that are performed below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ideas both

) as query:
query.to_anndata(X_name="raw")
t2 = perf_counter()
print(f"End to end time {t2 - t1}")
Copy link
Collaborator

@atolopko-czi atolopko-czi Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is timing necessary here, since this is being called by the profiler, which is already timing this script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@beroy beroy force-pushed the perf_checker branch 17 times, most recently from c3c5ae1 to 6f765ef Compare February 1, 2024 23:39
@beroy beroy force-pushed the perf_checker branch 23 times, most recently from ed3e36d to 1d9701f Compare March 5, 2024 23:38
Copy link
Member

@ebezzi ebezzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left a few nitpicks.

uses: aws-actions/configure-aws-credentials@v1
with:
aws-region: us-west-2
role-to-assume: arn:aws:iam::401986845158:role/MyNewPlayground #arn:aws:iam::401986845158:role/PlaygroundS3 #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate/remove comments

python3.11 -m venv ~/venv
. ~/venv/bin/activate

pip install psutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can go on a single line

import tiledbsoma as soma

print("Starting bm 1", file=stderr)
census_S3_latest = dict(census_version="2024-01-01")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pin a non LTS version it will go away after one month. That said, this is a test file so it's probably fine, but leave a comment for posterity.

@beroy beroy force-pushed the perf_checker branch 2 times, most recently from a85038a to b47fd35 Compare March 8, 2024 23:12
Include a basic benchmark as the starting point and needed scripts
@beroy beroy merged commit 311a352 into main Mar 9, 2024
15 checks passed
@beroy beroy deleted the perf_checker branch March 9, 2024 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants