Collection of (off-policy) rl algorithms. Fully compatible with OpenAI gym.
Real-time monitoring of training done with visdom.
Deep Learning extension of deterministic policy gradients (DPG), an off-policy RL algorithm. My implementation uses action and parameter noise to improve exploration at the start of training and then throughout the remainder of the steps.
In progress - DDPG with tweaks to counter the tendency of DDPG to overestimate Q-function later during learning. Also uses action and parameter noise to improve exploration
In progress - DDPG with Ape-X framework (using ray for this) and PER
This is an implementation of DDPG in Pytorch with action and parameter noise for exploration.
- Implement prioritized experience replay
- Implement Ape-X distributed training framework with Ray
- Fix visdom logging
- Clean up unneeded code.
- Integrate into https://github.com/yeshg/deep-rl
Code structure, visdom logging from on https://github.com/p-morais/deep-rl
Basic implementations of DDPG and TD3 from official TD3 release repo: https://github.com/sfujim/TD3