Detail explains on MXNet SGD weight_decay and tfa SGDW and l2 regularizer #19
leondgarse
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
MXNet SGD and tfa SGDW
weight_decay
inmx.optimizer.SGD
andtfa.optimizers.SGDW
is different.wd
withlr
.learning_rate=0.1, weight_decay=5e-4
, weight is actually modified by5e-5
.wd
withlr
. Withlearning_rate=0.1, weight_decay=5e-4
, weight is actually modified with5e-4
.learning_rate=0.1, weight_decay=5e-4
inmx.optimizer.SGD
is equal tolearning_rate=0.1, weight_decay=5e-5
intfa.optimizers.SGDW
.wd_mult=10
in a MXNet layer,wd
will mutiply by10
in this layer. This means it will beweight_decay == 5e-4
in a keras layer.L2 Regularization and Weight Decay
Keras.optimizers.SGD
keras.regularizers.L2(λ)
, it should beL2_regulalizer
will modify the weights value byl2 * lr == 5e-4 * 0.1 = 5e-5
.mx.optimizer.SGD(learning_rate=0.1, wd=5e-4)
andwd_mult=10
in a MXNet layer, which actually decay this layer's weights withwd * wd_mult * learning_rate == 5e-4
, and other layerswd * learning_rate == 5e-5
.tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-5)
.keras.regularizers.L2
withl2 == weight_decay / learning_rate * (wd_mult - 1) / 2
to this layer.SGD with momentum
MXNet
Keras SGDW Using
wd == lr * wd
,weight
will be the same withMXNet SGD
in the first update, butmomentum_stat
will be different. Then in the second update,weight
will also be different.Keras SGD with l2 regularizer can behave same as
MXNet SGD
Keras Model test
MXNet model test
mx.symbol.Variable
, has to be added byopt.set_wd_mult
.Beta Was this translation helpful? Give feedback.
All reactions