FMInference · Michaelvll · Mar 7, 2023 · Mar 7, 2023 · Mar 7, 2023 · Mar 8, 2023
diff --git a/README.md b/README.md
@@ -66,9 +66,26 @@ python3 -m flexgen.apps.helm_run --description mmlu:model=text,subject=abstract_
 ```
 Note that only a subset of HELM scenarios is tested. See more tested scenarios [here](flexgen/apps/helm_passed_30b.sh).
 
+### Run FlexGen on Any Cloud with SkyPilot
+FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud.
+First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)):
+```bash
+pip install "skypilot[aws,gcp,azure,lambda]"  # pick your clouds
+sky check
+```
+You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs:
+```bash
+sky launch -c flexgen --detach-setup flexgen/apps/skypilot.yaml
+```
+You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically.
+
+To run any other FlexGen command, you can edit [`flexgen/apps/skypilot.yaml`](./flexgen/apps/skypilot.yaml) and replace the `run` section.
+
 ### Data Wrangling
 You can run the examples in this paper, ['Can Foundation Models Wrangle Your Data?'](https://arxiv.org/abs/2205.09911), by following the instructions [here](flexgen/apps/data_wrangle).
 
+
+
 ## Performance Benchmark
 ### Generation Throughput (token/s)
 The corresponding effective batch sizes are in parentheses. Please see [here](benchmark/batch_size_table.md) for more details.
@@ -86,6 +103,7 @@ The corresponding effective batch sizes are in parentheses. Please see [here](be
 
 How to [reproduce](benchmark/flexgen).
 
+
 ## Roadmap
 We plan to work on the following features.
 

diff --git a/flexgen/apps/README.md b/flexgen/apps/README.md
@@ -25,3 +25,18 @@ Run Massive Multitask Language Understanding (MMLU) scenario.
 ```
 python3 helm_run.py --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100
 ```
+
+### Run on any cloud with SkyPilot
+FlexGen benchmark can be launched with [SkyPilot](https://github.com/skypilot-org/skypilot), a tool for launching ML jobs on any cloud.
+First, install SkyPilot and check you have some cloud credentials ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)):
+```bash
+pip install "skypilot[aws,gcp,azure,lambda]"  # pick your clouds
+sky check
+```
+You can now use a single command to launch the benchmark on any cloud, which automatically finds a region (in the cheapest-price order) with availability for the requested GPUs:
+```bash
+sky launch -c flexgen --detach-setup skypilot.yaml
+```
+You can then log into the cluster running the job with `ssh flexgen` for monitoring. Once the job has finished, you can terminate the cluster with `sky down flexgen` or pass in `--down` flag to the command above to have the cluster terminate itself automatically.
+
+To run any other FlexGen command, you can edit [`skypilot.yaml`](skypilot.yaml) and replace the `run` section.
diff --git a/flexgen/apps/skypilot.yaml b/flexgen/apps/skypilot.yaml
@@ -0,0 +1,35 @@
+# A SkyPilot job definition for benchmarking FlexGen.
+# References:
+#   https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html
+#   https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html
+
+# Specify the resources required for this job.
+resources:
+  accelerators: T4:1  # Can replace with other GPU type and count, see `sky show-gpus`.
+  memory: 200+ # requires more than 200GB of memory
+
+setup: |
+  # Install Latest CUDA
+  wget -q https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
+  echo Installing CUDA 11.6.0
+  sudo sh cuda_11.6.0_510.39.01_linux.run --silent --toolkit
+
+  # Create conda environment
+  conda create -y -n flexgen python=3.9
+  conda activate flexgen
+  pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
+  pip install crfm-helm==0.2.1
+
+  # Install flexgen
+  git clone https://github.com/FMInference/FlexGen.git || true
+  cd FlexGen
+  pip install -e .
+
+run: |
+  # Run any FlexGen command
+  conda activate flexgen
+  # python3 -m flexgen.flex_opt --model facebook/opt-1.3b
+  python3 -m flexgen.apps.helm_run \
+          --description mmlu:model=text,subject=abstract_algebra,data_augmentation=canonical \
+          --pad-to-seq-len 512 --model facebook/opt-30b --percent 20 80 0 100 0 100 \
+          --gpu-batch-size 48 --num-gpu-batches 3 --max-eval-instance 100