Skip to content

Latest commit

 

History

History
31 lines (17 loc) · 2.02 KB

File metadata and controls

31 lines (17 loc) · 2.02 KB

AMSGrad

AMSGrad Example

The motivation for AMSGrad lies with the observation that Adam fails to converge to an optimal solution for some data-sets and is outperformed by SDG with momentum.

Reddi et al. (2018) [1] show that one cause of the issue described above is the use of the exponential moving average of the past squared gradients.

To fix the above-described behavior, the authors propose a new algorithm called AMSGrad that keeps a running maximum of the squared gradients instead of an exponential moving average.

For simplicity, the authors also removed the debiasing step, which leads to the following update rule:

For more information, check out the paper 'On the Convergence of Adam and Beyond' and the AMSGrad section of the 'An overview of gradient descent optimization algorithms' article.

[1] Reddi, Sashank J., Kale, Satyen, & Kumar, Sanjiv. [On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237v1).

Code

Resources