PyTorch implementation of PPO

NOTE: This is not maintained. I recommend using the implementation here. It is much more full featured and tested.

This is a PyTorch implementation of Proximal Policy Optimization.

This is code mostly ported from the OpenAI baselines implementation but currently does not optimize each batch for several epochs. I will add this soon.

Usage

python main.py --env-name Walker2d-v1

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Todo

Add multiple epochs per batch
Test results compared to baselines code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PyTorch implementation of PPO

Usage

Contributions

Todo

Files

README.md

Latest commit

History

README.md

File metadata and controls

PyTorch implementation of PPO

Usage

Contributions

Todo