- Trust Region Policy Optimization (https://arxiv.org/pdf/1502.05477.pdf)
- Proximal Policy Optimization (https://arxiv.org/abs/1707.06347)
- Sample Efficient Actor-Critic with Experience Replay (https://arxiv.org/abs/1611.01224)
- Continuous control with deep reinforcement learning (https://arxiv.org/abs/1509.02971)
- Dueling Network Architectures for Deep Reinforcement Learning (https://arxiv.org/abs/1511.06581)
- Deep Reinforcement Learning that Matters (https://arxiv.org/pdf/1709.06560.pdf)
- Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control (https://arxiv.org/pdf/1708.04133.pdf)
Tests were done on 3 different environments: OpenAi-Gym (MountainCarContinuous), Mujoco (Reacher) and Atari (Breakout).
- DQN (
src/mountaincar-continuous/dqn
andresults/gym-mountaincarcontinuous/dqn
) - DDPG (
src/mountaincar-continuous/ddpg
andresults/gym-mountaincarcontinuous/ddpg
) - PPO (
src/mountaincar-continuous/ppo
andresults/gym-mountaincarcontinuous/ppo
)
- DDPG (
src/baselines/baselines/ddpg
andresults/mujoco-reacher/ddpg
) - TRPO (
src/baselines/baselines/trpo_mpi
andresults/mujoco-reacher/trpo
) - PPO (
src/baselines/baselines/ppo2
andresults/mujoco-reacher/ppo
)
- ACER (
src/baselines/baselines/acer
andresults/atari-breakout/acer
) - TRPO (
src/baselines/baselines/trpo_mpi
andresults/atari-breakout/trpo
) - PPO (
src/baselines/baselines/ppo2
andresults/atari-breakout/ppo
)
To get the source code, execute the following commands:
git clone https://github.com/lajoiepy/Reinforcement_Learning_PPO.git
cd Reinforcement_Learning_PPO
git submodule init
git submodule update
- The source code in
src/baselines
is a fork of https://github.com/openai/baselines. - The source code in
src/mountaincar-continuous/ddpg
is mostly from https://github.com/lirnli/OpenAI-gym-solutions/blob/master/Continuous_Deep_Deterministic_Policy_Gradient_Net/DDPG%20Class%20ver2.ipynb - The source code in
src/mountaincar-continuous/dqn
is inspired from http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html - The source code in
src/mountaincar-continuous/ppo
is a fork from https://github.com/tpbarron/pytorch-ppo
- Pytorch for
src/mountaincar-continuous/dqn
andsrc/mountaincar-continuous/ppo
. - Tensorflow for
src/mountaincar-continuous/ddpg
andsrc/baselines
. - Gym for
src/mountaincar-continuous
. - Mujoco and Atari for
src/baselines
.
- Follow README files for code in
src/baselines
. - For PPO on Gym environnements run
python3 src/mountaincar-continuous/pytorch-ppo/main.py --env-name MountainCarContinuous-v0
. - For the DQN implementation run
python3 mountaincar_dqn.py
. - For the DDPG implementation run
python3 mountaincar_ddpg.py
.