You can find more useful information at OpenAI Spinning Up
-
RND / Random Network Distillation (30 Oct 2018)
-
Dopamine (28 Sep 2018)
-
Ape-X (2 May 2018)
-
TD3 / Twin Delayed DDPG (26 Feb 2018)
-
IMPALA / Importance Weighted Actor-Learner Architecture (5 Feb 2018)
-
N2D / NEC2DQN / Deep Q-learning using Neural Episodic Control (6 Jan 2018)
-
SAC / Soft Actor Critic (4 Jan 2018)
-
Rainbow (6 Oct 2017)
-
A2C / Advantage Actor-Critic (18 Aug 2017)
- A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C)
-
ACKTR / Actor Critic using Kronecker-factored Trust Region (17 Aug 2017)
-
C51 / 51-atom agent (21 Jul 2017)
-
PPO / Proximal Policy Optimization (20 Jul 2017)
-
HER / Hindsight Experience Replay (5 Jul 2017)
-
ICM / Intrinsic Curiosity Module (15 May 2017 / 13 Aug 2018)
-
DQFD / Deep Q-learning from Demonstration (12 Apr 2017)
-
ACER / Actor-Critic with Experience Replay (3 Nov 2016)
-
GAIL / Generative Adversarial Imitation (10 Jun 2016)
-
CMA-ES / Covariance Matrix Adaptation Evolution Strategy (4 Apr 2016)
-
A3C / Asynchronous Advantage Actor-Critic (4 Feb 2016)
-
NAF / Normalised Advantage Functions (2 Mar 2016)
-
Dueling DQN (20 Nov 2015)
-
PER / Prioritized Experience Replay (18 Nov 2015)
-
DDQN / Double DQN (22 Sep 2015)
-
DDPG / Deep Deterministic Policy Gradient (9 Sep 2015)
-
DRQN / Deep Recurrent Q-Network (23 Jul 2015)
-
GAE / Generalized Advantage Estimation (8 Jun 2015)
-
TRPO / Trust Region Policy Optimization (19 Feb 2015)
-
DPG / Deterministic Policy Gradient
-
DQN / Deep Q Network (19 Dec 2013 / 25 Feb 2015)
-
AC / Actor-Critic Algorithms (2000)
-
VPG / Vanilla Policy Gradient (29 Nov 1999)
-
SARSA / State–action–reward–state–action (Sep 1994)