Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

Add SkyPilot example for running benchmarks #96

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,26 @@ python3 -m flexgen.apps.helm_run --description mmlu:model=text,subject=abstract_
```
Note that only a subset of HELM scenarios is tested. See more tested scenarios [here](flexgen/apps/helm_passed_30b.sh).

### Run FlexGen on Any Cloud with SkyPilot
FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud.
First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)):
```bash
pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds
sky check
```
You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs:
```bash
sky launch -c flexgen --detach-setup flexgen/apps/skypilot.yaml
```
You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically.

To run any other FlexGen command, you can edit [`flexgen/apps/skypilot.yaml`](./flexgen/apps/skypilot.yaml) and replace the `run` section.

### Data Wrangling
You can run the examples in this paper, ['Can Foundation Models Wrangle Your Data?'](https://arxiv.org/abs/2205.09911), by following the instructions [here](flexgen/apps/data_wrangle).



## Performance Benchmark
### Generation Throughput (token/s)
The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details.
Expand All @@ -86,6 +103,7 @@ The corresponding effective batch sizes are in parentheses. Please see [here](be

How to [reproduce](benchmark/flexgen).


## Roadmap
We plan to work on the following features.

Expand Down
15 changes: 15 additions & 0 deletions flexgen/apps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,18 @@ Run Massive Multitask Language Understanding (MMLU) scenario.
```
python3 helm_run.py --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100
```

### Run on any cloud with SkyPilot
FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud.
First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)):
```bash
pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds
sky check
```
You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs:
```bash
sky launch -c flexgen --detach-setup skypilot.yaml
```
You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically.

To run any other FlexGen command, you can edit [`skypilot.yaml`](skypilot.yaml) and replace the `run` section.
35 changes: 35 additions & 0 deletions flexgen/apps/skypilot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# A SkyPilot job definition for benchmarking FlexGen.
# References:
# https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html
# https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html

# Specify the resources required for this job.
resources:
accelerators: T4:1 # Can replace with other GPU type and count, see `sky show-gpus`.
memory: 200+ # requires more than 200GB of memory

setup: |
# Install Latest CUDA
wget -q https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
echo Installing CUDA 11.6.0
sudo sh cuda_11.6.0_510.39.01_linux.run --silent --toolkit

# Create conda environment
conda create -y -n flexgen python=3.9
conda activate flexgen
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install crfm-helm==0.2.1

# Install flexgen
git clone https://github.com/FMInference/FlexGen.git || true
cd FlexGen
pip install -e .

run: |
# Run any FlexGen command
conda activate flexgen
# python3 -m flexgen.flex_opt --model facebook/opt-1.3b
python3 -m flexgen.apps.helm_run \
--description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical \
--pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 \
--gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100