Skip to content

Latest commit

 

History

History
54 lines (41 loc) · 4.06 KB

hierarchical_reinforcement_learning.md

File metadata and controls

54 lines (41 loc) · 4.06 KB

Hierarchical Reinforcement Learning

Modular Multitask Reinforcement Learning with Policy Sketches [ICML 2017]

  • Proposes to use as supervision for hierarchical agents "policy sketches": sequences of reusable blocks of behavior without any specifics of the behaviors themselves
  • Interesting idea, probably more efficient than e.g. reward design

FeUdal Networks for Hierarchical Reinforcement Learning [ICML 2017]

  • Proposes a manager-worker architecture: manager sets goals for worker and both agents are trained with policy gradients
  • Assumes that manager's decisions result in a particular distribution over future states; this seems unlikely
  • ~2000 on Montezuma's Revenge

The Predictron: End-To-End Learning and Planning [arXiv 2016]

  • Proposes a NN architecture capable of learning an internal MRP and outputting value estimates
  • Shows that the NN queries different depths on different tasks

Stochastic Neural Networks for Hierarchical Reinforcement Learning [ICLR 2017]

  • Uses a stochastic neural network to learn skills before task is presented(pre-training)
  • Trains skills by maximizing channel capacity
  • Solves continuous control tasks that were previously unsolved

Surprise-based Intrinsic Motivation for Deep Reinforcement Learning [ICLR 2017]

  • Introduces an additional reward term proportional to how unexpected the state transition was to the agent's model

Strategic Attentive Writer for Learning Macro-Actions [NIPS 2016]

  • Develops an algorithm that learns to plan sequences of actions in addition to their level of commitment
  • Only works for finite action spaces and a predetermined timeline

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [NIPS 2016]

  • Introduces h-DQN, a deep network version of the options framework
  • Uses hardwired goals(similar to 'salient events' from Singh 2004)

The Option-Critic Architecture [NIPS Workshop 2015]

  • Derives policy gradient theorems for options

Intrinsically Motivated Reinforcement Learning [NIPS 2004]

  • Intrinsic motivation suggests that there is value in defining options independently of the task
  • The agent creates options to achieve 'salient events': events that are predetermined to be inherently interesting to the agent

Recent Advances in Hierarchical Reinforcement Learning [DEDS 2003]

  • Summarizes three approaches to hierarchical RL: options, HAMs, MAXQ

Learning Options in Reinforcement Learning [ISARA 2002]

  • Learns options by randomly specifying a task and making the most frequently visited state the termination condition of an option.

Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning [Artificial Intelligence Journal 1999]

  • Defines an option as a policy with a termination and initiation condition
  • Proves that using options instead of only the primitive actions on an MDP is an SMDP problem
  • Proves that allowing interruption of options increases return
  • Derives and proves convergence of option Q-learning
  • Shows an environment where predefined options considerably shortens learning

Learning Macro-Actions in Reinforcement Learning [NIPS 1998]

  • Lets previous action influence the choice of action
  • Defines modified Q-value which is a linear combination of Q(s_t, a_t) and Q(a_t-1, a_t)