Skip to content

Commit

Permalink
Refactoring Maxdiffusion readme (#106)
Browse files Browse the repository at this point in the history
  • Loading branch information
parambole authored Sep 24, 2024
1 parent 261f132 commit a73fbcf
Show file tree
Hide file tree
Showing 4 changed files with 148 additions and 11 deletions.
13 changes: 2 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ MaxDiffusion supports
# Table of Contents

* [Getting Started](#getting-started)
* [Local Development for single host](#getting-started-local-development-for-single-host)
* [Training](#training)
* [Dreambooth](#dreambooth)
* [Inference](#inference)
Expand All @@ -55,17 +54,9 @@ We recommend starting with a single TPU host and then moving to multihost.

Minimum requirements: Ubuntu Version 22.04, Python 3.10 and Tensorflow >= 2.12.0.

## Getting Started: Local Development for single host
Local development is a convenient way to run MaxDiffusion on a single host.
## Getting Started:

1. [Create and SSH to a single-host TPU (v4-8). ](https://cloud.google.com/tpu/docs/users-guide-tpu-vm#creating_a_cloud_tpu_vm_with_gcloud)
1. Clone MaxDiffusion in your TPU VM.
1. Within the root directory of the MaxDiffusion `git` repo, install dependencies by running:
```bash
pip3 install jax[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
pip3 install -r requirements.txt
pip3 install .
```
For your first time running Maxdiffusion, we provide specific [instructions](docs/getting_started/first_run.md).

## Training

Expand Down
17 changes: 17 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# MaxDiffusion Documentation

This folder contains documentation for getting started with and using MaxDiffusion.

## Getting Started

* **[First Run](getting_started/first_run.md)** - Provides instructions for setting up and running MaxDiffusion for the first time.
* **[Running MaxDiffusion via XPK](getting_started/run_maxdiffusion_via_xpk.md)** - Explains how to run MaxDiffusion using the XPK format.

## Contributing & Community

* **[Code of Conduct](code-of-conduct.md)** - Outlines the expected behavior for contributors to the project.
* **[Contributing](contributing.md)** - Provides guidelines for contributing to the MaxDiffusion project.

## Training

* **[Common Training Guide](train_README.md)** - Provides a comprehensive guide to training MaxDiffusion models, including script usage, configuration options, and sharding strategies.
23 changes: 23 additions & 0 deletions docs/getting_started/first_run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Getting Started

We recommend starting with a single host first and then moving to multihost.

## Getting Started: Local Development for single host

#### Running on Cloud TPUs
Local development is a convenient way to run MaxDiffusion on a single host. It doesn't scale to
multiple hosts.

1. [Create and SSH to a single-host TPU (v4-8). ](https://cloud.google.com/tpu/docs/users-guide-tpu-vm#creating_a_cloud_tpu_vm_with_gcloud)
1. Clone MaxDiffusion in your TPU VM.
1. Within the root directory of the MaxDiffusion `git` repo, install dependencies by running:
```bash
pip3 install jax[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
pip3 install -r requirements.txt
pip3 install .
```

## Getting Starting: Multihost development

[GKE, recommended] [Running MaxDiffusion with xpk](run_maxdiffusion_via_xpk.md) - Quick Experimentation and Production support

106 changes: 106 additions & 0 deletions docs/getting_started/run_maxdiffusion_via_xpk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# How to run MaxDiffusion with XPK?

This document focuses on steps required to setup XPK on TPU VM and assumes you have gone through the [README](https://github.com/google/xpk/blob/main/README.md) to understand XPK basics.

## Steps to setup XPK on TPU VM

* Verify you have these permissions for your account or service account

Storage Admin \
Kubernetes Engine Admin

* gcloud is installed on TPUVMs using the snap distribution package. Install kubectl using snap
```shell
sudo apt-get update
sudo apt install snapd
sudo snap install kubectl --classic
```
* Install `gke-gcloud-auth-plugin`
```shell
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

sudo apt update && sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin
```

* Authenticate gcloud installation by running this command and following the prompt
```
gcloud auth login
```

* Run this command to configure docker to use docker-credential-gcloud for GCR registries:
```
gcloud auth configure-docker
```

* Test the installation by running
```
docker run hello-world
```

* If getting a permission error, try running
```
sudo usermod -aG docker $USER
```
after which log out and log back in to the machine.

## Build Docker Image for MaxDiffusion

1. Git clone MaxDiffusion locally

```shell
git clone https://github.com/google/MaxDiffusion.git
cd MaxDiffusion
```
2. Build local MaxDiffusion docker image

This only needs to be rerun when you want to change your dependencies. This image may expire which would require you to rerun the below command

```shell
# Default will pick stable versions of dependencies
bash docker_build_dependency_image.sh
```
3. After building the dependency image `maxdiffusion_base_image`, xpk can handle updates to the working directory when running `xpk workload create` and using `--base-docker-image`.

See details on docker images in xpk here: https://github.com/google/xpk/blob/main/README.md#how-to-add-docker-images-to-a-xpk-workload

**Note:** When using the XPK command, ensure you include `pip install .` to install the package from the current directory. This is necessary because the container is created from a copy of your local directory, and `pip install .` ensures any local changes you've made are applied within the container.
__Using xpk to upload image to your gcp project and run MaxDiffusion__
```shell
gcloud config set project $PROJECT_ID
gcloud config set compute/zone $ZONE
# See instructions in README.me to create below buckets.
BASE_OUTPUT_DIR=gs://output_bucket/
DATASET_PATH=gs://dataset_bucket/
# Install xpk
pip install xpk
# Make sure you are still in the MaxDiffusion github root directory when running this command
xpk workload create \
--cluster ${CLUSTER_NAME} \
--base-docker-image maxDiffusion_base_image \
--workload ${USER}-first-job \
--tpu-type=v4-8 \
--num-slices=1 \
--command "pip install . && python src/maxdiffusion/train.py src/maxdiffusion/configs/base_2_base.yml run_name="my_run" output_dir="gs://your-bucket/""
```
__Using [xpk github repo](https://github.com/google/xpk.git)__
```shell
git clone https://github.com/google/xpk.git
# Make sure you are still in the MaxDiffusion github root directory when running this command
python3 xpk/xpk.py workload create \
--cluster ${CLUSTER_NAME} \
--base-docker-image maxDiffusion_base_image \
--workload ${USER}-first-job \
--tpu-type=v4-8 \
--num-slices=1 \
--command "pip install . && python src/maxdiffusion/train.py src/maxdiffusion/configs/base_2_base.yml run_name="my_run" output_dir="gs://your-bucket/""
```

0 comments on commit a73fbcf

Please sign in to comment.