Skip to content

Commit

Permalink
Controlnet training (huggingface#650)
Browse files Browse the repository at this point in the history
Co-authored-by: regisss <[email protected]>
  • Loading branch information
vidyasiv and regisss authored Mar 7, 2024
1 parent 7df8006 commit d755787
Show file tree
Hide file tree
Showing 7 changed files with 1,334 additions and 20 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ slow_tests_diffusers: test_installs
python -m pytest tests/test_diffusers.py -v -s -k "test_textual_inversion"
python -m pip install peft==0.7.0
python -m pytest tests/test_diffusers.py -v -s -k "test_train_text_to_image_"
python -m pytest tests/test_diffusers.py -v -s -k "test_train_controlnet"

# Run text-generation non-regression tests
slow_tests_text_generation_example: test_installs
Expand Down
37 changes: 34 additions & 3 deletions examples/stable-diffusion/training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,40 @@ python textual_inversion.py \
> As described in [the official paper](https://arxiv.org/abs/2208.01618), only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`. However, one can also add multiple embedding vectors for the placeholder token to increase the number of fine-tuneable parameters. This can help the model to learn more complex details. To use multiple embedding vectors, you can define `--num_vectors` to a number larger than one, *e.g.*: `--num_vectors 5`. The saved textual inversion vectors will then be larger in size compared to the default case.

## ControlNet Training

ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models ](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala. It is a type of model for controlling StableDiffusion by conditioning the model with an additional input image.
This example is adapted from [controlnet example in the diffusers repository](https://github.com/huggingface/diffusers/tree/main/examples/controlnet#training).

First, download the conditioning images as shown below:

```bash
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
```

Then proceed to training with command:

```bash
python train_controlnet.py \
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5\
--output_dir=/tmp/stable_diffusion1_5 \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=4 \
--throughput_warmup_steps=3 \
--use_hpu_graphs \
--bf16
```

### Multi-card Run

You can run this fine-tuning script in a distributed fashion as follows:
You can run these fine-tuning scripts in a distributed fashion as follows:
```bash
python ../gaudi_spawn.py --use_mpi --world_size 8 textual_inversion.py \
python ../../gaudi_spawn.py --use_mpi --world_size 8 textual_inversion.py \
--pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 \
--train_data_dir ./cat \
--learnable_property object \
Expand All @@ -86,7 +115,9 @@ python ../gaudi_spawn.py --use_mpi --world_size 8 textual_inversion.py \
--output_dir /tmp/textual_inversion_cat \
--save_as_full_pipeline \
--gaudi_config_name Habana/stable-diffusion \
--throughput_warmup_steps 3
--throughput_warmup_steps 3 \
--use_hpu_graphs \
--bf16
```


Expand Down
Loading

0 comments on commit d755787

Please sign in to comment.