Skip to content

Commit

Permalink
add AudioDiffusionPipeline and LatentAudioDiffusionPipeline huggingfa…
Browse files Browse the repository at this point in the history
…ce#1334 (huggingface#1426)

* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline

* add docs to toc

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* Update pr_tests.yml

Fix tests

* parent 499ff34
author teticio <[email protected]> 1668765652 +0000
committer teticio <[email protected]> 1669041721 +0000

parent 499ff34
author teticio <[email protected]> 1668765652 +0000
committer teticio <[email protected]> 1669041704 +0000

add colab notebook

[Flax] Fix loading scheduler from subfolder (huggingface#1319)

[FLAX] Fix loading scheduler from subfolder

Fix/Enable all schedulers for in-painting (huggingface#1331)

* inpaint fix k lms

* onnox as well

* up

Correct path to schedlure (huggingface#1322)

* [Examples] Correct path

* uP

Avoid nested fix-copies (huggingface#1332)

* Avoid nested `# Copied from` statements during `make fix-copies`

* style

Fix img2img speed with LMS-Discrete Scheduler (huggingface#896)

Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the  `integrate.quad` call later on- by long I mean more than 10x slower.

Co-authored-by: Anton Lozhkov <[email protected]>

Fix the order of casts for onnx inpainting (huggingface#1338)

Legacy Inpainting Pipeline for Onnx Models (huggingface#1237)

* Add legacy inpainting pipeline compatibility for onnx

* remove commented out line

* Add onnx legacy inpainting test

* Fix slow decorators

* pep8 styling

* isort styling

* dummy object

* ordering consistency

* style

* docstring styles

* Refactor common prompt encoding pattern

* Update tests to permanent repository home

* support all available schedulers until ONNX IO binding is available

Co-authored-by: Anton Lozhkov <[email protected]>

* updated styling from PR suggested feedback

Co-authored-by: Anton Lozhkov <[email protected]>

Jax infer support negative prompt (huggingface#1337)

* support negative prompts in sd jax pipeline

* pass batched neg_prompt

* only encode when negative prompt is None

Co-authored-by: Juan Acevedo <[email protected]>

Update README.md: Minor change to Imagic code snippet, missing dir error (huggingface#1347)

Minor change to Imagic Readme

Missing dir causes an error when running the example code.

make style

change the sample model (huggingface#1352)

* Update alt_diffusion.mdx

* Update alt_diffusion.mdx

Add bit diffusion [WIP] (huggingface#971)

* Create bit_diffusion.py

Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG

* adding bit diffusion to new branch

ran tests

* tests

* tests

* tests

* tests

* removed test folders + added to README

* Update README.md

Co-authored-by: Patrick von Platen <[email protected]>

* move Mel to module in pipeline construction, make librosa optional

* fix imports

* fix copy & paste error in comment

* fix style

* add missing register_to_config

* fix class docstrings

* fix class docstrings

* tweak docstrings

* tweak docstrings

* update slow test

* put trailing commas back

* respect alphabetical order

* remove LatentAudioDiffusion, make vqvae optional

* move Mel from models back to pipelines :-)

* allow loading of pretrained audiodiffusion models

* fix tests

* fix dummies

* remove reference to latent_audio_diffusion in docs

* unused import

* inherit from SchedulerMixin to make loadable

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>
  • Loading branch information
teticio and patrickvonplaten authored Dec 5, 2022
1 parent 459b8ca commit 48d0123
Show file tree
Hide file tree
Showing 25 changed files with 781 additions and 5 deletions.
1 change: 1 addition & 0 deletions .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ jobs:

- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev -y
python -m pip install -e .[quality,test]
python -m pip install git+https://github.com/huggingface/accelerate
python -m pip install -U git+https://github.com/huggingface/transformers
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,4 @@ tags
# DS_Store (MacOS)
.DS_Store
# RL pipelines may produce mp4 outputs
*.mp4
*.mp4
2 changes: 2 additions & 0 deletions docker/diffusers-flax-cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docker/diffusers-flax-tpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -35,6 +36,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docker/diffusers-onnxruntime-cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docker/diffusers-onnxruntime-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docker/diffusers-pytorch-cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docker/diffusers-pytorch-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ RUN apt update && \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3-pip \
python3.8-venv && \
Expand All @@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
datasets \
hf-doc-builder \
huggingface-hub \
librosa \
modelcards \
numpy \
scipy \
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@
title: "VQ Diffusion"
- local: api/pipelines/repaint
title: "RePaint"
- local: api/pipelines/audio_diffusion
title: "Audio Diffusion"
title: "Pipelines"
- sections:
- local: api/experimental/rl
Expand Down
102 changes: 102 additions & 0 deletions docs/source/api/pipelines/audio_diffusion.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Audio Diffusion

## Overview

[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.

Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
and from mel spectrogram images.

The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
training scripts and example notebooks.

## Available Pipelines:

| Pipeline | Tasks | Colab
|---|---|:---:|
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |


## Examples:

### Audio Diffusion

```python
import torch
from IPython.display import Audio
from diffusers import DiffusionPipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)

output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=mel.get_sample_rate()))
```

### Latent Audio Diffusion

```python
import torch
from IPython.display import Audio
from diffusers import DiffusionPipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)

output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
```

### Audio Diffusion with DDIM (faster)

```python
import torch
from IPython.display import Audio
from diffusers import DiffusionPipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)

output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
```

### Variations, in-painting, out-painting etc.

```python
output = pipe(
raw_audio=output.audios[0, 0],
start_step=int(pipe.get_default_steps() / 2),
mask_start_secs=1,
mask_end_secs=1,
)
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
```

## AudioDiffusionPipeline
[[autodoc]] AudioDiffusionPipeline
- __call__
- encode
- slerp


## Mel
[[autodoc]] Mel
- audio_slice_to_image
- image_to_audio
1 change: 1 addition & 0 deletions docs/source/api/pipelines/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ available a colab notebook to directly try them out.
| Pipeline | Paper | Tasks | Colab
|---|---|:---:|:---:|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio_diffusion.git) | Unconditional Audio Generation |
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ available a colab notebook to directly try them out.
| Pipeline | Paper | Tasks | Colab
|---|---|:---:|:---:|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb)
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
Expand Down
4 changes: 2 additions & 2 deletions docs/source/using-diffusers/audio.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ specific language governing permissions and limitations under the License.

# Using Diffusers for audio

The [`DanceDiffusionPipeline`] can be used to generate audio rapidly!
More coming soon!
[`DanceDiffusionPipeline`] and [`AudioDiffusionPipeline`] can be used to generate
audio rapidly! More coming soon!
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
"isort>=5.5.4",
"jax>=0.2.8,!=0.3.2",
"jaxlib>=0.1.65",
"librosa",
"modelcards>=0.1.4",
"numpy",
"parameterized",
Expand Down Expand Up @@ -181,6 +182,7 @@ def run(self):
extras["training"] = deps_list("accelerate", "datasets", "tensorboard", "modelcards")
extras["test"] = deps_list(
"datasets",
"librosa",
"parameterized",
"pytest",
"pytest-timeout",
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@
)
from .pipeline_utils import DiffusionPipeline
from .pipelines import (
AudioDiffusionPipeline,
DanceDiffusionPipeline,
DDIMPipeline,
DDPMPipeline,
KarrasVePipeline,
LDMPipeline,
LDMSuperResolutionPipeline,
Mel,
PNDMPipeline,
RePaintPipeline,
ScoreSdeVePipeline,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/dependency_versions_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"isort": "isort>=5.5.4",
"jax": "jax>=0.2.8,!=0.3.2",
"jaxlib": "jaxlib>=0.1.65",
"librosa": "librosa",
"modelcards": "modelcards>=0.1.4",
"numpy": "numpy",
"parameterized": "parameterized",
Expand Down
13 changes: 12 additions & 1 deletion src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
from ..utils import is_flax_available, is_onnx_available, is_torch_available, is_transformers_available
from ..utils import (
is_flax_available,
is_librosa_available,
is_onnx_available,
is_torch_available,
is_transformers_available,
)


if is_torch_available():
Expand All @@ -14,6 +20,11 @@
else:
from ..utils.dummy_pt_objects import * # noqa F403

if is_torch_available() and is_librosa_available():
from .audio_diffusion import AudioDiffusionPipeline, Mel
else:
from ..utils.dummy_torch_and_librosa_objects import AudioDiffusionPipeline, Mel # noqa F403

if is_torch_available() and is_transformers_available():
from .alt_diffusion import AltDiffusionImg2ImgPipeline, AltDiffusionPipeline
from .latent_diffusion import LDMTextToImagePipeline
Expand Down
3 changes: 3 additions & 0 deletions src/diffusers/pipelines/audio_diffusion/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# flake8: noqa
from .mel import Mel
from .pipeline_audio_diffusion import AudioDiffusionPipeline
Loading

0 comments on commit 48d0123

Please sign in to comment.