System requirements: The code has been tested on these environments.
- Ubuntu 16.04 LTS
- Python 3.6.7 (will not work on Python 3.6.0 due to some issues of pytorch and python 3.6.0)
- Torch 1.1.0 (the most recent version of pytorch will work )
- Mujoco_py==1.50
- Gym
QNTRPO solves the Policy Optimization problem that arises in Reinforcement Learning using a Quasi-Newton Trust Region algorithm.
The code depends on external libraries. Install the software following the instructions below. We are describing the installation in a virtual environment.
conda create -n qntrpo python=3.11 anaconda
source activate qntrpo
conda install pytorch
Install Mujoco and mujoco-py following the instructions in https://github.com/openai/mujoco-py (License: MIT
)
Install Gym following the instructions in https://github.com/openai/gym (License: MIT
)
If a user wants to change the trust region radius for optimization, they should change the parameter "tr_maxdelta" on line 67 in the code "trust_region_opt_torch.py". The current value is 1e-1. It is suggested to run the code with this value. The performance of the algorithm on other values have not been fully tested yet.
A different batch size could be used by adding another argument while calling the code, --batch-size N, where (N is an integer say 25000), i.e.,
python main.py --env-name "Walker2d-v2" --seed 1243 --batch-size 25000
QNTRPO algorithm can be tested by running the following in a terminal (for example for Walker2d and seed, say 1243).
python main.py --env-name "Walker2d-v2" --seed 1243
If you use the software, please cite the following (TR2019-120):
@inproceedings{Jha2019oct,
author = {Jha, Devesh K. and Raghunathan, Arvind and Romeres, Diego},
title = {Quasi-Newton Trust Region Policy Optimization},
booktitle = {Conference on Robot Learning (CoRL)},
year = 2019,
editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura},
pages = {945--954},
month = oct,
publisher = {Proceedings of Machine Learning Research},
url = {https://www.merl.com/publications/TR2019-120}
}
Please contact one of us Devesh K Jha ([email protected]), Arvind U Raghunathan ([email protected]), or Diego Romeres ([email protected]).
See CONTRIBUTING.md for our policy on contributions.
Released under AGPL-3.0-or-later
license, as found in the LICENSE.md file.
All files:
Copyright (C) 2019, 2023 Mitsubishi Electric Research Laboratories (MERL).
SPDX-License-Identifier: AGPL-3.0-or-later