-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated deployments with
Training Neural Networks
Signed-off-by: Ayush Joshi <[email protected]>
- Loading branch information
1 parent
b53c517
commit 59d96bd
Showing
3 changed files
with
41 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Training Neural Networks | ||
|
||
**Backpropagation** is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks. | ||
|
||
## Best Practices | ||
|
||
This section explains backpropagation's failure cases and the most common way to regularize a neural network. | ||
|
||
### Failure Cases | ||
|
||
There are a number of common ways for backpropagation to go wrong. | ||
|
||
#### Vanishing Gradients | ||
|
||
The gradients for the lower layers (closer to the input) can become very small. In deep networks, computing these gradients can involve taking the product of many small terms. | ||
|
||
When the gradients vanish toward 0 for the lower layers, these layers train very slowly, or not at all. | ||
|
||
The ReLU activation function can help prevent vanishing gradients. | ||
|
||
#### Exploding Gradients | ||
|
||
If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms. In this case you can have exploding gradients: gradients that get too large to converge. | ||
|
||
Batch normalization can help prevent exploding gradients, as can lowering the learning rate. | ||
|
||
#### Dead ReLU Units | ||
|
||
Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0 activation, contributing nothing to the network's output, and gradients can no longer flow through it during backpropagation. With a source of gradients cut off, the input to the ReLU may not ever change enough to bring the weighted sum back above 0. | ||
|
||
Lowering the learning rate can help keep ReLU units from dying. | ||
|
||
### Dropout Regularization | ||
|
||
Yet another form of regularization, called **Dropout**, is useful for neural networks. It works by randomly "dropping out" unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization: | ||
|
||
* 0.0 = No dropout regularization. | ||
* 1.0 = Drop out everything. The model learns nothing. | ||
* Values between 0.0 and 1.0 = More useful. |