-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactoring Maxdiffusion readme (#106)
- Loading branch information
Showing
4 changed files
with
148 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# MaxDiffusion Documentation | ||
|
||
This folder contains documentation for getting started with and using MaxDiffusion. | ||
|
||
## Getting Started | ||
|
||
* **[First Run](getting_started/first_run.md)** - Provides instructions for setting up and running MaxDiffusion for the first time. | ||
* **[Running MaxDiffusion via XPK](getting_started/run_maxdiffusion_via_xpk.md)** - Explains how to run MaxDiffusion using the XPK format. | ||
|
||
## Contributing & Community | ||
|
||
* **[Code of Conduct](code-of-conduct.md)** - Outlines the expected behavior for contributors to the project. | ||
* **[Contributing](contributing.md)** - Provides guidelines for contributing to the MaxDiffusion project. | ||
|
||
## Training | ||
|
||
* **[Common Training Guide](train_README.md)** - Provides a comprehensive guide to training MaxDiffusion models, including script usage, configuration options, and sharding strategies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Getting Started | ||
|
||
We recommend starting with a single host first and then moving to multihost. | ||
|
||
## Getting Started: Local Development for single host | ||
|
||
#### Running on Cloud TPUs | ||
Local development is a convenient way to run MaxDiffusion on a single host. It doesn't scale to | ||
multiple hosts. | ||
|
||
1. [Create and SSH to a single-host TPU (v4-8). ](https://cloud.google.com/tpu/docs/users-guide-tpu-vm#creating_a_cloud_tpu_vm_with_gcloud) | ||
1. Clone MaxDiffusion in your TPU VM. | ||
1. Within the root directory of the MaxDiffusion `git` repo, install dependencies by running: | ||
```bash | ||
pip3 install jax[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html | ||
pip3 install -r requirements.txt | ||
pip3 install . | ||
``` | ||
|
||
## Getting Starting: Multihost development | ||
|
||
[GKE, recommended] [Running MaxDiffusion with xpk](run_maxdiffusion_via_xpk.md) - Quick Experimentation and Production support | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# How to run MaxDiffusion with XPK? | ||
|
||
This document focuses on steps required to setup XPK on TPU VM and assumes you have gone through the [README](https://github.com/google/xpk/blob/main/README.md) to understand XPK basics. | ||
|
||
## Steps to setup XPK on TPU VM | ||
|
||
* Verify you have these permissions for your account or service account | ||
|
||
Storage Admin \ | ||
Kubernetes Engine Admin | ||
|
||
* gcloud is installed on TPUVMs using the snap distribution package. Install kubectl using snap | ||
```shell | ||
sudo apt-get update | ||
sudo apt install snapd | ||
sudo snap install kubectl --classic | ||
``` | ||
* Install `gke-gcloud-auth-plugin` | ||
```shell | ||
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list | ||
|
||
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - | ||
|
||
sudo apt update && sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin | ||
``` | ||
|
||
* Authenticate gcloud installation by running this command and following the prompt | ||
``` | ||
gcloud auth login | ||
``` | ||
|
||
* Run this command to configure docker to use docker-credential-gcloud for GCR registries: | ||
``` | ||
gcloud auth configure-docker | ||
``` | ||
|
||
* Test the installation by running | ||
``` | ||
docker run hello-world | ||
``` | ||
|
||
* If getting a permission error, try running | ||
``` | ||
sudo usermod -aG docker $USER | ||
``` | ||
after which log out and log back in to the machine. | ||
|
||
## Build Docker Image for MaxDiffusion | ||
|
||
1. Git clone MaxDiffusion locally | ||
|
||
```shell | ||
git clone https://github.com/google/MaxDiffusion.git | ||
cd MaxDiffusion | ||
``` | ||
2. Build local MaxDiffusion docker image | ||
|
||
This only needs to be rerun when you want to change your dependencies. This image may expire which would require you to rerun the below command | ||
|
||
```shell | ||
# Default will pick stable versions of dependencies | ||
bash docker_build_dependency_image.sh | ||
``` | ||
3. After building the dependency image `maxdiffusion_base_image`, xpk can handle updates to the working directory when running `xpk workload create` and using `--base-docker-image`. | ||
|
||
See details on docker images in xpk here: https://github.com/google/xpk/blob/main/README.md#how-to-add-docker-images-to-a-xpk-workload | ||
|
||
**Note:** When using the XPK command, ensure you include `pip install .` to install the package from the current directory. This is necessary because the container is created from a copy of your local directory, and `pip install .` ensures any local changes you've made are applied within the container. | ||
__Using xpk to upload image to your gcp project and run MaxDiffusion__ | ||
```shell | ||
gcloud config set project $PROJECT_ID | ||
gcloud config set compute/zone $ZONE | ||
# See instructions in README.me to create below buckets. | ||
BASE_OUTPUT_DIR=gs://output_bucket/ | ||
DATASET_PATH=gs://dataset_bucket/ | ||
# Install xpk | ||
pip install xpk | ||
# Make sure you are still in the MaxDiffusion github root directory when running this command | ||
xpk workload create \ | ||
--cluster ${CLUSTER_NAME} \ | ||
--base-docker-image maxDiffusion_base_image \ | ||
--workload ${USER}-first-job \ | ||
--tpu-type=v4-8 \ | ||
--num-slices=1 \ | ||
--command "pip install . && python src/maxdiffusion/train.py src/maxdiffusion/configs/base_2_base.yml run_name="my_run" output_dir="gs://your-bucket/"" | ||
``` | ||
__Using [xpk github repo](https://github.com/google/xpk.git)__ | ||
```shell | ||
git clone https://github.com/google/xpk.git | ||
# Make sure you are still in the MaxDiffusion github root directory when running this command | ||
python3 xpk/xpk.py workload create \ | ||
--cluster ${CLUSTER_NAME} \ | ||
--base-docker-image maxDiffusion_base_image \ | ||
--workload ${USER}-first-job \ | ||
--tpu-type=v4-8 \ | ||
--num-slices=1 \ | ||
--command "pip install . && python src/maxdiffusion/train.py src/maxdiffusion/configs/base_2_base.yml run_name="my_run" output_dir="gs://your-bucket/"" | ||
``` |