A Furuta Pendulum is a control problem which involves a driven arm (free-rotating in the vertical plane) attached to a horizontal rotating plane. The goal of the robot is to balance the arm in the vertical position. This is a more advanced adaptation of the classic cartpole problem, where it includes swing-up (swinging the arm to the vertical position in the first place).
The following parts were procured for the robot assembly:
- Nema 17 Stepper Motor 2.8V
- 600 PPR Photoeletric Incremental Rotary Encoder
- Motor Driver Board
- 12V 2A Power Supply Plug Charger
- DRV8825 Stepper Motor Driver
- Arduino Nano
The pendulum and encoder housing were designed in Fusion 360 and produced with FDM printing. The .step
and .f3d
files can be found in /robot/CAD
.
The pendulum was attached to the encoder with a rigid shaft coupler. The encoder housing was attached to the stepper motor with a modified M6 bolt in a T-nut fitted in another rigid shafter coupler.
The Arduino communicates with the Python script with bi-directional serial communication, where it:
- Reads the acceleration input for the motor from the PC and implements it.
- Writes the motor position and pendulum position data for generating the observation space for the model.
- Implements reset functions for resetting the motor position and encoder value.
Follows the standard Gymnasium class format. Defined below is the observation space, action space, and reward function used for the experiment:
-
$\theta$ : angular position of pendulum. 0 at the top and normalised between$[-\pi,\pi]$ . -
$\dot{\theta}$ : angular velocity of pendulum. Experimentally bounded between$[-10,10]$ , then normalised to between$[-2, 2]$ . -
$\alpha$ : motor position (measured in steps instead of angle). Steps range physically limited to 90° left and right or$[-200, 200]$ step range. However, observation space spans further between$[-300, 300]$ to account for the motor exceeding the limit slightly. The range is then normalised between$[-3,3]$ . -
$\dot{\alpha}$ : motor velocity (steps per second). Experimentally bounded between$[-4, 4]$ , then normalised to between$[-1, 1]$ -
$\ddot{\alpha}$ : motor acceleration (steps per second squared). Control input into the system. Bounded between$[-20000,20000]$ and normalised between$[-2, 2]$ .
Note that all values are continuous.
ndarray
of size (5,) containing 5 continuous observation values. Using
ndarray
of size (1,) containing motor acceleration value. Continuous action space.
If reward was /conf/config.yaml
.
The environment terminates when the stepper motor exceeds the allowed range of motion (more than 90° left or right from the zero position).
The environment is truncated after 500 timesteps, but this can be adjusted in /conf/mode/train.yaml
.
The script provides a high level of flexibility for training and evaluating the model. Configurations are stored in the conf
folder, organised in the following manner:
config.yaml
- model selection (PPO or SAC)
- mode selection (train or eval)
- serial communication configuration between PC and arduino
- action and observation space configuration
- reward function weights
- toggle logging with tensorboard
/mode/train.yaml
:- new model file configuration
- device config (cuda or cpu)
- training total timesteps and episode length
/mode/eval.yaml
:- device config (cuda or cpu)
- episode length set to -1 for infinite episode length
/model
(PPO.yaml
andSAC.yaml
) config files:- model weights
-
To run the code, first build dependencies in the root directory with:
pip install -e .
-
Upload the Arduino code in
/robot/arduino/main/main.ino
to the Arduino board. -
Connect the robot to the PC and run the script, remembering to set the mode to
train
:python src/main.py
-
The trained model will then be stored in
/model
, where it can be trained further or evaluated.
We largely referred to the following resources for guidance:
- Armandpl's video and repository on building a similar project as us.
- Inspiration for this project was taken from Quanser Qube design. Reward function adapted from Quanser's code.
- Stable Baseline's guide on custom environment creation.
- Farama's guide on handling termination vs truncated scenarios when designing our environment.