logs7:Test RL with small model

Test RL with small model 2018/04/19

Log 1: what specific output am I working on right now?
- I try to verify the RL framework is working.
- If avg_len goes up and it's reproducible, it's working
Log 2: thinking out loud - e.g. hypotheses about the current problem, what to work on next
- Run and observe graph
- The loss begin with 20, oops forgot to copy the learned model.
- Here are the results from small data RL initialized from normal seq2seq.
- Observations
  - valid_loss seems okay, it started from almost zero and goes down.
  - reward and reply_len don't change
    - for each input, model is returning exact same reward.
- Things to do
  - Check if we can reproduce -> Yes we could.
- Observations
  - the reward doesn't change.
    - what does it mean?
    - replies from seq2seq are same length? let's confirm -> YES

replies[[ 6  7  5 20  1 27 28 29  4  1]
 [23 24 25  4 26 27 28 29  4  1]
 [30 31  9 32 33  5  4  1  1  1]
 [ 6  7  5 20  1 27 28 29  4  1]
 [23 24 25  4 26 27 28 29  4  1]
 [30 31  9 32 33  5  4  1  1  1]]
length = [5, 10, 8, 5, 10, 8]
reward= 0.19166666666666668

My hypothesis is the seq2seq model is already converged and it's too late to move it from local optimum.
- How to confirm. Stop the seq2seq traing at 120 steps then compare with the previous result.
  - previous result: avg len 19.0, validation loss 0.0037353924, reward= 0.19166666666666668
  - result: avg len 18.5, validation loss 0.016834794 reward= 0.19166666666666668
    -
- The results tell almost nothing.
  - We don't surely know if RL worked because it eventually converges to loss=0. (Model is too big).
Conclusion: Just testing small model and data didn't work. We should explore the entropy method described in the blog.
Log 3: record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
Log 4: results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs7:Test RL with small model

Test RL with small model 2018/04/19

Clone this wiki locally