-
Notifications
You must be signed in to change notification settings - Fork 19
logs13:Understand Policy Gradient
Higepon Taro Minowa edited this page May 11, 2018
·
14 revisions
- What is policy gradient?
- We multiply loss value by reward, but what does it actually mean?
- How should we initialize model based on policy
- Write blog about Policy Gradient blog link
-
DONE Deep Reinforcement Learning: Pong from Pixels
-
Teaching
- DONE RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning - YouTube `. __DONE Lecture 2: Markov Decision Process
- John Schulman 2: Deep Reinforcement Learning - YouTube
-
Teaching
- DONE Policy gradients for reinforcement learning in TensorFlow (OpenAI gym CartPole environment) SHOULD REVISIT
- Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents
- https://github.com/williamFalcon/DeepRLHacks
- http://joschu.net/docs/nuts-and-bolts.pdf
- https://www.alexirpan.com/2018/02/14/rl-hard.html
- https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/
- should reward be positive and negative?
- should reward be normalized?
- we are summing up reward and multiply but it doesn't make sense?
- each action(=each reply) should get reward?
- Cross-entropy method - Wikipedia Andrej said this is the first one we should do.
- Make matching charts
- Pong case: batch, reward, policy
- seq2seq case: batch, reward, policy
2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
- run1: title