Revisiting Exploding Gradient: A Ghost That Never Leaves

This is the official implementation for our paper

Revisiting Exploding Gradient: A Ghost That Never Leaves

Kai Hu

Abstract

The exploding gradient problem is one of the main barrier to training deep neural networks. It is widely believed that this problem can be greatly solved by techniques such as careful weight initialization and normalization layers.
However, we find that exploding gradients still exist in deep neural networks, and normalization layers are only able to conceal this problem. Our theory shows that the source of such exploding gradients does not come from the linear layer weights but non-linear activations. Specifically, plain networks' gradient increases exponentially with the number of nonlinear layers. Based on our theory, we are able to mitigate this gradient problem and train deep plain networks without any skip connection or shortcuts. Our 50-layer plain network, SeqNet50 achieves a 77.1% top-1 validation accuracy on ImageNet, matching the performance of ResNet50. We hope our work can provide new insights about deep neural networks.

Get started

Require torch>=1.7.0. Put the ImageNet data in a "data" folder with subfolders "train" and "val".

Run the experiment by:

bash run.sh ( ... train script args...)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
tools		tools
README.md		README.md
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Exploding Gradient: A Ghost That Never Leaves

Abstract

Get started

About

Releases

Packages

Languages

hukkai/no-residual-network

Folders and files

Latest commit

History

Repository files navigation

Revisiting Exploding Gradient: A Ghost That Never Leaves

Abstract

Get started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages