Name is to be determined.
A parallel reinforcement learning framework written in C++
Goal: build a scalable, reliable and well-maintained RL framework in C++ that can be used for engineering research.
see documentation here
For dependence, building and running, see here
Version alpha 0.1 implements Policy Proximal Optimization (PPO) learning algorithm only. Basic MPI functions are used for communications between simulation nodes and the learning node. Libtorch is used for neural network training and inference. Our code can be compiled and run on USC CARC.
Using pybind11 to make our code callable in python, using GPU for training and inference, and implement other algorithms are our furture goals.
Developed by Kishore Ganesh, Qiongao Liu, Yusheng Jiao, Chenchen Huang, Haotian Hang as USC CSCI596 course final project
We used our code to trian on a cartpole environment. The learning curve using only one environment node is shown as follows.
The learning curve using three environments running simutaneously is showns as follows.
In the learning curves above, the dots shows the reward of each episode, and the line shows the average reward over 100 episodes.
- Download dependence: libtorch and add libtorch to your environment path:
wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
unzip libtorch-shared-with-deps-latest.zip
export Torch_DIR=</absolute/path/to/libtorch>
echo "export Torch_DIR=</absolute/path/to/libtorch>" >> ~/.bashrc
clang-format is not reqiured, but if you want to use "make format", you need to have it installed on debian system:
sudo apt install clang-format
- build our project: First way(using Makefile):
make all
or
make build
or
make debug
Second way(using cmake):
mkdir build
cd build
cmake ..
cmake --build . --config Release
or
cmake --build . --config Debug
2.1 build on USC CARC
module load gcc/8.3.0
module load openmpi/4.0.2
module load cmake
export LD_PRELOAD=/spack/apps/gcc/8.3.0/lib64/libstdc++.so.6
build:
cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ ..
- running:
mpirun -n 4 ./cpp-rl-training <path/to/config/file> > out
or
mpirun -n 4 ./cpp-rl-training > out
in which 4 is the amount of nodes you want to use, at least 2 if config file is not specified, the default config file will be used (../config) (> out) means everything which are supposed to be printed in terminal is instead printed in a file called out
3.1 running on CARC use .sl file