The Cart Pole problem is solved in 3D using Reinforcement Learning (RL). The cartpole3d environment is implemented in ROS with the package openai_ros. The RL algorithms used for training are DQN, A2C, ACKTR, PPO, TRPO implemented in the stable-baselines library.
-
Install ROS Melodic
-
Install the openai_ros package.
cd ~/catkin_make/src/
git clone https://bitbucket.org/theconstructcore/openai_ros.git
cd ~/catkin_make
catkin_make
source devel/setup.bash
rosdep install openai_ros
- Install Gym
pip install gym
-
Install stable-baselines
-
Install cartpole3d
cd ~/catkin_make/src/
git clone https://github.com/PierreExeter/cartpole3d_ros.git
cd ~/catkin_make
catkin_make
- Make the scripts executable
chmod +x ~/catkin_make/src/cartpole3d/scripts/cartpole3d_random.py
chmod +x ~/catkin_make/src/cartpole3d/scripts/cartpole3d_dqn.py
chmod +x ~/catkin_make/src/cartpole3d/scripts/cartpole3d_trpo.py
chmod +x ~/catkin_make/src/cartpole3d/scripts/cartpole3d_train_all.py
chmod +x ~/catkin_make/src/cartpole3d/scripts/cartpole3d_enjoy_all.py
- Create simulation workspace
mkdir -p ~/simluation_ws/src
cd ..
catkin_make
source devel/setup.bash
rospack profile
roslaunch cartpole3d start_training_cartpole3d_random.launch
You should see the cartpole executing random actions in Gazebo.
roslaunch cartpole3d start_training_cartpole3d_train_all.launch
cd ~/catkin_make/src/cartpole3d/results/
tensorboard --logdir=A2C:tensorboard_logs/A2C/,ACKTR:tensorboard_logs/ACKTR/,PPO2:tensorboard_logs/PPO2/,TRPO:tensorboard_logs/TRPO/
roslaunch cartpole3d start_training_cartpole3d_enjoy_all.launch
cd ~/catkin_make/src/cartpole3d/scripts/
python plot_reward.py
Tested with:
- Ubuntu 18.04
- Python 2.7
- ROS Melodic
- Gazebo 9.12