PPO balancer

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the MPC balancer and PID balancer, it balances Upkie with straight legs. Training uses the UpkieGroundVelocity gym environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Installation

conda env create -f environment.yaml
conda activate ppo_balancer

Running a policy

On your machine

To run the default policy:

make test_policy

Here we assumed the spine is already up and running, for instance by running ./start_simulation.sh on your machine, or by starting a pi3hat spine on the robot.

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

On a real robot

Upload the agent repository to the robot:

make upload

Then, SSH into the robot and run the following target:

$ ssh your-upkie
user@your-upkie:~$ python ppo_balancer/run.py

This will run the policy saved at the default path. To run a custom policy, save its ZIP file to the robot (save its operative config as well for your future reference) and pass it path as argument to run.py.

Training a new policy

First, check that training progresses one rollout at a time:

make train_and_show

Once this works you can train for real, with more environments and no GUI:

make train

Check out the time/fps plots in the command line or in TensorBoard to adjust the number of parallel environments:

make tensorboard

You should increase the number of environments from the default value (NB_TRAINING_ENVS in the Makefile) to "as much as you can as long as FPS keeps going up".

Export dependencies to your Upkie

PPO balancer uses pixi-pack to export a pixi environment to your Upkie. If you don't have it yet, you can install pixi from here.

First, create an environment.tar file with the following command:

pixi run pack-to-upkie

Then, upload it to your Upkie and unpack it by:

pixi-pack unpack environment.tar

If pixi-pack is not installed on your Upkie, you can get a pixi-pack-aarch64-unknown-linux-gnu binary from the pixi-pack release page. Finally, activate the environment and run the agent:

source ./activate.sh
python ppo_balancer/run.py

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
ppo_balancer		ppo_balancer
tools		tools
training		training
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.gitattributes		.gitattributes
.gitignore		.gitignore
BUILD		BUILD
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
WORKSPACE		WORKSPACE
environment.yaml		environment.yaml
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
start_simulation.sh		start_simulation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO balancer

Installation

Running a policy

On your machine

On a real robot

Training a new policy

Export dependencies to your Upkie

Q&A

About

Contributors 3

Languages

License

upkie/ppo_balancer

Folders and files

Latest commit

History

Repository files navigation

PPO balancer

Installation

Running a policy

On your machine

On a real robot

Training a new policy

Export dependencies to your Upkie

Q&A

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages