logs4:See if RL framework is working

Log 1: what specific output am I working on right now?
- Before we go with complex RL, we should confirm if RL framework is working.
Log 2: thinking out loud - e.g. hypotheses about the current problem, what to work on next
- One of the easiest way is giving higher reward when reply length is longer.
- by measuring average length by step, we should observe increase of average length.
- steps
  - enable length reward
  - log average length
  - start RL
Log 3: record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
- if avg_len goes up eventually this is good sign.
- reward also should go up
- loss should converge
- if avg_len don't go up, there's something wrong :(
- Someone taught me reward should be averaged, because it has high variance.
- Paused here, because we have to fix log issue to see the results.
Log 4: results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by environment the agent is being trained in)
- TODO. This should be logged carefully.
- TODO model/README + ipnyb
- models20180414ReinforcementLearningLengthReward

Provide feedback