Scaling of PiGDM guidance #3

man-sean · 2023-09-21T11:25:17Z

In the PiGDM paper (Sec A.1, Algorithm 1) it says that we need to scale the guidance term by $\sqrt{\alpha_t}$.
In the code we scale by $\sqrt{\alpha_t} \cdot \sqrt{\alpha_{t-1}}$:

coeff = alpha_s.sqrt() 
if not self.awd:
    coeff = coeff - c2 * alpha_t.sqrt() / (1 - alpha_t).sqrt()
coeff = coeff * alpha_t.sqrt() * self.grad_term_weight

If we only scale by $\sqrt{\alpha_t}$ we get NaN during inference due to large guidance.
From were this additional scaling by $\sqrt{\alpha_{t-1}}$ comes from?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling of PiGDM guidance #3

Scaling of PiGDM guidance #3

man-sean commented Sep 21, 2023

Scaling of PiGDM guidance #3

Scaling of PiGDM guidance #3

Comments

man-sean commented Sep 21, 2023