Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Exercise 12

In our last exercise we will examine the methods of policy gradient which allow us to set up control algorithms on environments with continuous state AND action space. The environment under consideration is given by the LunarLander from OpenAI's gym. This toy example is based upon the arcade game Lunar Lander by Atari and is defined by an 8-dimensional (continuous) state space and a 2-dimensional (continuous) action space. Plenty of challenges!

Tasks:

  1. Monte-Carlo policy gradient (REINFORCE) using a Gaussian policy
  2. Actor-Critic algorithm with TD(0) targets using a Gaussian policy