Designing an API for diffusion models #1
francois-rozet
started this conversation in
General
Replies: 2 comments 2 replies
-
Sounds good to me! We already discussed a lot of points for which features are coming soon. I will be pleased to help for tutorials. |
Beta Was this translation helpful? Give feedback.
2 replies
-
Hello all 👋 I've made a lot of progress on the interface and the repo in general (tests, docs, contributing guidelines, README, ...). I consider the current version (0.1.0) as the first beta version of Azula. I'm now looking for feedback (see questions in the main post). cc @gerome-andry, @JuliaLinhart, @bkmi, @michaeldeistler, @janfb, @simonschnake, @blt2114 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone 👋
This discussion is the continuation of probabilists/zuko#52. I have created the Azula repository. Its goal is to unify the different formalisms and notations of the generative diffusion models literature into a single, convenient and hackable interface. I have written a first draft for the API.
Formalism
In Azula's formalism, a diffusion model is the composition of three elements: a noise schedule, a denoiser and a sampler.
A noise schedule is a mapping from a time$t \in [0, 1]$ to the signal scale $\alpha_t$ and the noise scale $\sigma_t$ in a perturbation kernel $p(X_t \mid X) = \mathcal{N}(X_t \mid \alpha_t X_t, \sigma_t^2 I)$ from a "clean" random variable $X \sim p(X)$ to a "noisy" random variable $X_t$ .
Because$\alpha_t$ and $\sigma_t$ are not explicitly linked, any noise schedule can be implemented, such as the variance exploding ($\alpha_t = 1$ ) or variance preserving ($\sigma_t^2 = 1 - \alpha_t^2$ ) schedules.
A denoiser is a neural network trained to predict$X$ given $X_t$ . In practice, it is a Gaussian denoiser $q_\phi(X \mid X_t) = \mathcal{N}(X \mid \mu_\phi(X_t), \Sigma_\phi(X_t))$ . Different implementation parameterize the mean $\mu_\phi(X_t)$ and the covariance $\Sigma_\phi(X_t)$ differently.
A sampler defines a series of transition kernels$q_\phi(X_s \mid X_t)$ from $t$ to $s < t$ based on a noise schedule and a denoiser $q_\phi(X \mid X_t)$ . Simulating these transitions from $t = 1$ to $0$ samples approximately from $p(X)$ .
API
The API of the
azula
package closely follows this formalism and defines three core componenets:azula.noise.Schedule
,azula.denoise.Denoiser
, andazula.sample.Sampler
.In addition, the
azula.guidance
submodule implements guidance and posterior sampling algorithms. Finally, theazula.plugins
submodule hosts contributed code and compatibility wrappers for other diffusion model libraries. For exampleazula.plugins.adm
allows to load pre-trained diffusion models from the openai/guided-diffusion repository and use them with the same convenient interface.Questions
What do you think of the current API? Is it convenient for you? What would you add/change?
What do you think of the README/docs/logo? What tutorials should we add?
Would you like to contribute? If yes, what part (architectures, sampling algorithms, guidance algorithms, compatibility wrappers, ...)?
Feel free to ask more questions!
Beta Was this translation helpful? Give feedback.
All reactions