-
Notifications
You must be signed in to change notification settings - Fork 19
logs19: Random negative reward to avoid 0 loss
Higepon Taro Minowa edited this page May 18, 2018
·
5 revisions
Log Type | Detail |
---|---|
1: What specific output am I working on right now? | In the previous setup, rewards can become [-1, ..., -1], which ends up standardized rewards [0.0, ..., 0.0]. This makes the training meaning less. I want to improve it. |
2: Thinking out loud - hypotheses about the current problem - what to work on next - how can I verify |
The -1 reward is coming from all the initial y, which is always len=max_len, what if we return 0.9, 0.8, 0.7 randomly for that condition. |
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer | |
4: Results of runs and conclusion | It didn't work because rewards [0.9, 0.8, 0.7] ends up [1, 0, -1], which doesn't make sense, considering they are all zero len situation. |
5: Next steps | - why loss is 10^-3? -Find better way to avoid 0 loss |
6: mega.nz | rl_test_20180518090110 |
RL | ||
---|---|---|
{'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 22, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}