-
Notifications
You must be signed in to change notification settings - Fork 19
Home
Higepon Taro Minowa edited this page Aug 4, 2018
·
213 revisions
- client id on the chart is broken.
- log policy entropy
- limit iteration to see if the learning works.
- save iterator state
- logs0:old
- logs1:Use easy tf log
- logs2:Revisit auth on PyDrive
- logs3:How to save model
- logs4:See if RL framework is working
- logs5:Consolidate logs
- logs6:Make RL model work with small model
- logs7:Test RL with small model
- logs8:Understand policy entropy TO REVISIT
- logs9:Fix msec data issue
- logs10:See if RL works with medium model
- logs11:See if RL works with medium model2
- logs12:Train seq2seq with Large Data
- logs13:Understand Policy Gradient
- logs13:Understand Policy Gradient
- logs14:Normalize Reward
- logs15:See if new RL is working
- logs16:RL test shorter reply is better
- logs17:RL test len equals 2 is the best
- logs18:RL test train seq2seq first
- logs19: Random negative reward to avoid 0 loss
- logs20: Make a list of how we implement RL for seq2seq
- logs21: Steps to make small RL work
- logs22:Refactoring ideas
- Logs23:Train large seq2seq
- Logs24: Run RL on large seq2seq
- Logs25:Fix sample helper
- Logs26: Confirm dull response work for small tweets
- Logs27: Mutual Information
- Logs28: Mutual Information Observe
- Logs29: Wrap up Mutual Information Beam
- logs30:Show sampled reply for large data
- logs31:Get logits and log_prob
- logs32:Values to watch for RL
- logs:33 Long RL run