Inverted-Pendulum-Robot (Furuta Pendulum)

Introduction

A Furuta Pendulum is a control problem which involves a driven arm (free-rotating in the vertical plane) attached to a horizontal rotating plane. The goal of the robot is to balance the arm in the vertical position. This is a more advanced adaptation of the classic cartpole problem, where it includes swing-up (swinging the arm to the vertical position in the first place).

Hardware Implementation

The following parts were procured for the robot assembly:

Nema 17 Stepper Motor 2.8V
600 PPR Photoeletric Incremental Rotary Encoder
Motor Driver Board
12V 2A Power Supply Plug Charger
DRV8825 Stepper Motor Driver
Arduino Nano

The pendulum and encoder housing were designed in Fusion 360 and produced with FDM printing. The .step and .f3d files can be found in /robot/CAD.

The pendulum was attached to the encoder with a rigid shaft coupler. The encoder housing was attached to the stepper motor with a modified M6 bolt in a T-nut fitted in another rigid shafter coupler.

The Arduino communicates with the Python script with bi-directional serial communication, where it:

Reads the acceleration input for the motor from the PC and implements it.
Writes the motor position and pendulum position data for generating the observation space for the model.
Implements reset functions for resetting the motor position and encoder value.

Pendulum Env

Follows the standard Gymnasium class format. Defined below is the observation space, action space, and reward function used for the experiment:

$\theta$: angular position of pendulum. 0 at the top and normalised between $[-\pi,\pi]$.
$\dot{\theta}$: angular velocity of pendulum. Experimentally bounded between $[-10,10]$, then normalised to between $[-2, 2]$.
$\alpha$: motor position (measured in steps instead of angle). Steps range physically limited to 90° left and right or $[-200, 200]$ step range. However, observation space spans further between $[-300, 300]$ to account for the motor exceeding the limit slightly. The range is then normalised between $[-3,3]$.
$\dot{\alpha}$: motor velocity (steps per second). Experimentally bounded between $[-4, 4]$, then normalised to between $[-1, 1]$
$\ddot{\alpha}$: motor acceleration (steps per second squared). Control input into the system. Bounded between $[-20000,20000]$ and normalised between $[-2, 2]$.

Note that all values are continuous.

Observation Space

$\left[\cos{\theta}, \sin{\theta}, \dot{\theta}, \alpha, \dot{\alpha}\right]$

ndarray of size (5,) containing 5 continuous observation values. Using $\cos$ and $\sin$ values of $\theta$ experimentally showed better convergence rates than just $\theta$.

Action Space

$\left[\ddot{\alpha}\right]$

ndarray of size (1,) containing motor acceleration value. Continuous action space.

Reward Function

$\gamma-\left(\theta^2+C_1\times\dot{\theta}^2+C_2\times\alpha^2+C_3\times\dot{\alpha}^2+C_4\times\ddot{\alpha}^2\right)$

$\gamma$: reward offset value (offset) to ensure that reward is always positive.

If reward was $-\inf$ to $0$, episodes with early termination would generate a higher reward (because they would accumulate smaller negative reward), resulting in a faulty reward system. Hence, the offset aims to prevent this issue. Constants $C$ are used to adjust the reward function weightage and are defined in the /conf/config.yaml.

Termination

The environment terminates when the stepper motor exceeds the allowed range of motion (more than 90° left or right from the zero position).

Truncation

The environment is truncated after 500 timesteps, but this can be adjusted in /conf/mode/train.yaml.

Usage

Configuration

The script provides a high level of flexibility for training and evaluating the model. Configurations are stored in the conf folder, organised in the following manner:

config.yaml
- model selection (PPO or SAC)
- mode selection (train or eval)
- serial communication configuration between PC and arduino
- action and observation space configuration
- reward function weights
- toggle logging with tensorboard
/mode/train.yaml:
- new model file configuration
- device config (cuda or cpu)
- training total timesteps and episode length
/mode/eval.yaml:
- device config (cuda or cpu)
- episode length set to -1 for infinite episode length
/model (PPO.yaml and SAC.yaml) config files:
- model weights

Install and Run

To run the code, first build dependencies in the root directory with: pip install -e .
Upload the Arduino code in /robot/arduino/main/main.ino to the Arduino board.
Connect the robot to the PC and run the script, remembering to set the mode to train: python src/main.py
The trained model will then be stored in /model, where it can be trained further or evaluated.

Credits

We largely referred to the following resources for guidance:

Armandpl's video and repository on building a similar project as us.
Inspiration for this project was taken from Quanser Qube design. Reward function adapted from Quanser's code.
Stable Baseline's guide on custom environment creation.
Farama's guide on handling termination vs truncated scenarios when designing our environment.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
conf		conf
robot		robot
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
robot.jpg		robot.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverted-Pendulum-Robot (Furuta Pendulum)

Introduction

Hardware Implementation

Pendulum Env

Observation Space

Action Space

Reward Function

Termination

Truncation

Usage

Configuration

Install and Run

Credits

About

Releases

Packages

Contributors 2

Languages

License

energy-in-joles/Inverted-Pendulum-Robot

Folders and files

Latest commit

History

Repository files navigation

Inverted-Pendulum-Robot (Furuta Pendulum)

Introduction

Hardware Implementation

Pendulum Env

Observation Space

Action Space

Reward Function

Termination

Truncation

Usage

Configuration

Install and Run

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages