Skip to content

logs11:See if RL works with medium model2

Higepon Taro Minowa edited this page May 2, 2018 · 1 revision

See if RL works with medium model2

1: What specific output am I working on right now?

  • Run RL multiple times and see if it's stable.
  • CONCLUSION: I'm still not sure if it's working. Also not sure how I can validate it.
    • Probably I should go back basic.

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

  • We saw great results in the past. We may need some luck to get it.
  • Let's run 10 and see if we'd see it again.
  • Note that we increase num_steps for base seq2seq, so that we know it converged.

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

  • run1: Trial 1

    • Looks promising, but there are some future work. 1. seq2seq should be trained longer until it coverges. 2.RL also should be trained longer, sot that actual avg_len goes up.
  • run2: Trial 2

    • Pretty similar to Trial 1, looks also promising.
  • run3 and 4 : Trial 3 & 4

    • avg_reply_len(s) are flat in the end, which means they caught in local optima.
    • I wonder if we trained seq2seq too much and RL didn't have a way to move out from the local optima.
  • run 5 and 6: Trial 5 & 6

    • objective: See what happen if we don't train seq2seq much.
    • didn't converge or issued any good responses.
  • run 7 and 8: Trial 7 & 8

    • mimic run2, but training RL longer to see if it goes up.
    • Similar to run 3 & 4, but slightly better.
    • let's train less seq2seq and see how it goes.
  • run 9 and 10: Trial 9 & 10

    • train less seq2seq
    • didn't converge or reward didn't go up.
    • I think RL training is slow and didn't have enough time.
    • Let's have 4 times longer training
  • What to save. the hparams for comparison, and ipynb and related files.

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1
  • {'machine': 'master', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large'} dst {'machine': 'master', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large_rl'}

  • mega.nz directory: 20180430rl_test_medium7
seq2seq
RL
run2
  • src: {'machine': 'slave', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large'} dst {'machine': 'slave', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large_rl'}

  • mega.nz directory: 20180430rl_test_medium8

seq2seq
RL

run3

  • {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • mega.nz 20180501rl_test_medium9
seq2seq
RL

run4

  • {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • 20180501rl_test_medium10
seq2seq
RL

run5

  • {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 30, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • It didn't converge at all
  • ばいとおわ! [0]💩💩💩
    [1]💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩

  • 20180501rl_test_medium11
RL

run6

  • {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 30, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • It didn't converge at all

  • ばいとおわ! [0]💩💩💩
    [1]💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩

  • 20180501rl_test_medium12

RL

run7

  • {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 15600, 'model_path': 'model/tweet_large_rl'}

  • 今回もよろしくです。

  • [0]wwwwww💩ちゃん、もう暇だな。!!

  • [1]wwwwww💩ちゃん、もう暇だな。マスター)これ、、、、、いいいいのな?

  • 20180501rl_test_medium13
seq2seq
RL

run 8

  • {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 3120, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 15600, 'model_path': 'model/tweet_large_rl'}

  • 今回もよろしくです。

  • [0]ほら(・ω・)スッこの今日は💩なんだけど!!

  • [1]ほら(・ω・)スッこの今日は💩なんだけど(^_^;)

  • 20180501rl_test_medium14
seq2seq
RL

run 9

  • {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • 今回もよろしくです。 [0]💩!
    [1]💩!!

  • 20180501rl_test_medium15

seq2seq
RL

run 10

  • {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 6240, 'model_path': 'model/tweet_large_rl'}

  • おはようございます。寒いですね。 [0]そうなな(˙-˙) [1]そうなな(˘ω˘)

  • 20180501rl_test_medium16

seq2seq
RL

run 11

  • {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'}\n", "dst\n", "{'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 31200, 'model_path': 'model/tweet_large_rl'}

  • "今回もよろしくです。\n", " [0]間違えたwww \n", " [1]間違えた!! \n",

  • 20180501rl_test_medium17

RL

run 12

  • {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'}\n", "dst\n", "{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 30, 'decoder_length': 30, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 31200, 'model_path': 'model/tweet_large_rl'}

-> "今回もよろしくです。\n", " [0]💩が💩www \n", " [1]💩が💩💩💩💩をしてます‼ \n",

  • 20180501rl_test_medium18
RL
Clone this wiki locally