Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Initialization Methods #66

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions integration_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ echo "\n\nTest multiple layers with pre-rnn attention"
python3 train_model.py --train $TRAIN_PATH --dev $DEV_PATH --output_dir $EXPT_DIR --print_every 50 --embedding_size $EMB_SIZE --hidden_size $H_SIZE --rnn_cell $CELL --epoch $EPOCH --save_every $CP_EVERY --n_layers 3 --attention 'post-rnn' --attention_method 'dot'
ERR=$((ERR+$?)); EX=$((EX+1))

echo "\n\nTest Xavier/Glorot Initialization"
python3 train_model.py --train $TRAIN_PATH --dev $DEV_PATH --output_dir $EXPT_DIR --print_every 50 --embedding_size $EMB_SIZE --hidden_size $H_SIZE --rnn_cell $CELL --epoch 1 --save_every $CP_EVERY --n_layers 2 --glorot_init
ERR=$((ERR+$?)); EX=$((EX+1))

echo "\n\nTest uniform Initialization"
python3 train_model.py --train $TRAIN_PATH --dev $DEV_PATH --output_dir $EXPT_DIR --print_every 50 --embedding_size $EMB_SIZE --hidden_size $H_SIZE --rnn_cell $CELL --epoch 1 --save_every $CP_EVERY --n_layers 2 --uniform_init 0.1
ERR=$((ERR+$?)); EX=$((EX+1))

echo "\n\n\n$EX tests executed, $ERR tests failed\n\n"

rm -r $EXPT_DIR
21 changes: 19 additions & 2 deletions machine/models/seq2seq.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import torch.nn.functional as F
import torch.nn as nn

from .baseModel import BaseModel

Expand All @@ -8,9 +9,13 @@ class Seq2seq(BaseModel):
and decoder.
"""

def __init__(self, encoder, decoder, decode_function=F.log_softmax):
def __init__(self, encoder, decoder, decode_function=F.log_softmax,
uniform_init=0, glorot_init=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting all this in the init like this perhaps doesn't generalise very if you want to add also other kind of initialisations. Perhaps instead we could pass a function for initialisation? Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, so we would move this initialization function outside of the model class and maybe into a util? And one would have the option of passing an initialization function to any class and have the weights of that class initialized accordingly?

Copy link
Member Author

@gautierdag gautierdag Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it that way initially because I was following how openNMT has it but I think that makes more sense actually. So ignore this PR for now haha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks!

super(Seq2seq, self).__init__(encoder_module=encoder,
decoder_module=decoder, decode_function=decode_function)
decoder_module=decoder,
decode_function=decode_function)
# Initialize Weights
self._init_weights(uniform_init, glorot_init)

def flatten_parameters(self):
"""
Expand All @@ -32,3 +37,15 @@ def forward(self, inputs, input_lengths=None, targets={},
function=self.decode_function,
teacher_forcing_ratio=teacher_forcing_ratio)
return result

def _init_weights(self, uniform_init=0.0, glorot_init=False):
# initialize weights using uniform distribution
if uniform_init > 0.0:
for p in self.parameters():
p.data.uniform_(-uniform_init, uniform_init)

# xavier/glorot initialization if glorot_init
if glorot_init:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about initialisations, so mostly a question: does the glorot initialisation depend on the probability of the uniform? If not they are both specified does the glorot_init overwrite the uniform_init paramater? In the tests it seems that the are not mutually exclusive. Perhaps we could add some docstring explaining how they behave/interact?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if both are passed, then in this case Glorot overrides the uniform initialization for all parameters except biases. You usually tend to use either uniform or glorot/xavier at a time so I could add a docstring about this but it's not super likely that someone activates them both at the same time. Also code-wise this is exactly how openNMT does it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes sense. My confusion was mostly coming from my ignorance:).

for p in self.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)
26 changes: 25 additions & 1 deletion test/test_seq2seq.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
import unittest


from machine.models.EncoderRNN import EncoderRNN
from machine.models.DecoderRNN import DecoderRNN
from machine.models.seq2seq import Seq2seq


class TestSeq2seq(unittest.TestCase):
pass
def setUp(self):
self.decoder = DecoderRNN(100, 50, 16, 0, 1, input_dropout_p=0)
self.encoder = EncoderRNN(100, 10, 50, 16, n_layers=2, dropout_p=0.5)

def test_standard_init(self):
Seq2seq(self.encoder, self.decoder)
Seq2seq(self.encoder, self.decoder, uniform_init=-1)

def test_uniform_init(self):
Seq2seq(self.encoder, self.decoder, uniform_init=1)

def test_xavier_init(self):
Seq2seq(self.encoder, self.decoder, glorot_init=True)

def test_uniform_xavier_init(self):
Seq2seq(self.encoder, self.decoder, uniform_init=1, glorot_init=True)


if __name__ == '__main__':
unittest.main()
9 changes: 8 additions & 1 deletion train_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ def init_argparser():
choices=['adam', 'adadelta', 'adagrad', 'adamax', 'rmsprop', 'sgd'])
parser.add_argument('--max_len', type=int,
help='Maximum sequence length', default=50)
parser.add_argument('--uniform_init', type=float,
help='Initializes weights of model from uniform distribution in range (-uniform_init, uniform_init). \
If <= 0, standard pytorch init is used. (default, 0.0)', default=0.0)
parser.add_argument('--glorot_init', action='store_true',
help='Initializes weights of model using glorot/xavier distribution')
parser.add_argument(
'--rnn_cell', help="Chose type of rnn cell", default='lstm')
parser.add_argument('--bidirectional', action='store_true',
Expand Down Expand Up @@ -252,7 +257,9 @@ def initialize_model(opt, src, tgt, train):
bidirectional=opt.bidirectional,
rnn_cell=opt.rnn_cell,
eos_id=tgt.eos_id, sos_id=tgt.sos_id)
seq2seq = Seq2seq(encoder, decoder)
seq2seq = Seq2seq(encoder, decoder,
glorot_init=opt.glorot_init,
uniform_init=opt.uniform_init)
seq2seq.to(device)

return seq2seq, input_vocab, output_vocab
Expand Down