diff --git a/README.md b/README.md index 46a1e262..f4a5d384 100644 --- a/README.md +++ b/README.md @@ -66,9 +66,26 @@ python3 -m flexgen.apps.helm_run --description mmlu:model=text,subject=abstract_ ``` Note that only a subset of HELM scenarios is tested. See more tested scenarios [here](flexgen/apps/helm_passed_30b.sh). +### Run FlexGen on Any Cloud with SkyPilot +FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud. +First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)): +```bash +pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds +sky check +``` +You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs: +```bash +sky launch -c flexgen --detach-setup flexgen/apps/skypilot.yaml +``` +You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically. + +To run any other FlexGen command, you can edit [`flexgen/apps/skypilot.yaml`](./flexgen/apps/skypilot.yaml) and replace the `run` section. + ### Data Wrangling You can run the examples in this paper, ['Can Foundation Models Wrangle Your Data?'](https://arxiv.org/abs/2205.09911), by following the instructions [here](flexgen/apps/data_wrangle). + + ## Performance Benchmark ### Generation Throughput (token/s) The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details. @@ -86,6 +103,7 @@ The corresponding effective batch sizes are in parentheses. Please see [here](be How to [reproduce](benchmark/flexgen). + ## Roadmap We plan to work on the following features. diff --git a/flexgen/apps/README.md b/flexgen/apps/README.md index 017d9d11..c2ca1879 100644 --- a/flexgen/apps/README.md +++ b/flexgen/apps/README.md @@ -25,3 +25,18 @@ Run Massive Multitask Language Understanding (MMLU) scenario. ``` python3 helm_run.py --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100 ``` + +### Run on any cloud with SkyPilot +FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud. +First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)): +```bash +pip install "skypilot[aws,gcp,azure,lambda]" # pick your clouds +sky check +``` +You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs: +```bash +sky launch -c flexgen --detach-setup skypilot.yaml +``` +You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically. + +To run any other FlexGen command, you can edit [`skypilot.yaml`](skypilot.yaml) and replace the `run` section. diff --git a/flexgen/apps/skypilot.yaml b/flexgen/apps/skypilot.yaml new file mode 100644 index 00000000..24630bbc --- /dev/null +++ b/flexgen/apps/skypilot.yaml @@ -0,0 +1,35 @@ +# A SkyPilot job definition for benchmarking FlexGen. +# References: +# https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html +# https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html + +# Specify the resources required for this job. +resources: + accelerators: T4:1 # Can replace with other GPU type and count, see `sky show-gpus`. + memory: 200+ # requires more than 200GB of memory + +setup: | + # Install Latest CUDA + wget -q https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run + echo Installing CUDA 11.6.0 + sudo sh cuda_11.6.0_510.39.01_linux.run --silent --toolkit + + # Create conda environment + conda create -y -n flexgen python=3.9 + conda activate flexgen + pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116 + pip install crfm-helm==0.2.1 + + # Install flexgen + git clone https://github.com/FMInference/FlexGen.git || true + cd FlexGen + pip install -e . + +run: | + # Run any FlexGen command + conda activate flexgen + # python3 -m flexgen.flex_opt --model facebook/opt-1.3b + python3 -m flexgen.apps.helm_run \ + --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical \ + --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 \ + --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100