A ROS workspace to implement complex, intelligent human-robot interaction behaviors in dynamic environments using game-theoretical approaches. This was initially created as a final project for ASE 389: Game Theoretical Modeling of Multiagent Systems, Fall 2022 at the University of Texas - Austin taught by Dr. David Fridovich-Keil.
This repository is intended to be a central hub for developing game-theoretical planning and control methods for my Ph.D. research. Specifically, I am interested in robots that operate in dynamic, human-centered environments. Game theoretical models are well-suited for these situations, since they inherently involve multiple agents (players) in collaborative and/or adversarial roles, and no two operating environments (game states) are the same.
As of December 2022, this project is organized as follows:
- This repository (game_theory_ws), which contains the project writeup, initial results and future plans, and installation instructions
- A Robot Operating System (ROS)-Gazebo simulator testbed (hri_game_testbed), which serves as a controlled, repeatable development environment prior to deployment on robot hardware, and
- A ROS-enabled game-theoretical, hierarchical planner-controller module (hierarchical_game_control_ros), which receives state variable observations from
hri_game_testbed
, computes an optimal course of action/control inputs, and sends them to the simulated robot.
These elements are discussed in further detail below. For collaboration or any questions/comments/concerns about this project, feel free to contact me or raise an issue!
Robots and autonomous systems are expanding into a range of complex, dynamic, and populated operating environments. Novel service applications see robots operating in airports, hospitals, museums, city centers, and battlefields. However, robot action planning in such environments is inherently challenging: the correct course of action is situationally-dependent, or situated, and depends on the robot's goals and capabilities in addition to the status of the environment and agents within it (i.e. humans and vehicles).
The motivating example for this project is human-robot interaction in such environments. New service applications will require robots to accomplish service tasks while acting alongside humans in various roles (teammates, pedestrians, adversaries). Potential service tasks include social navigation, cargo/parcel delivery, information sharing, and leading/following behaviors. Since these tasks involve a dynamic combination of multiple discrete and continuous elements, robots in dynamic environments require a planning and control architecture that can act optimally across a range of potential operating states and conditions.
To this end, this project employs a two-layer hierarchical planning & control architecture. This architecture could enable robots & autonomous systems to affect intelligent behaviors amidst uncertain or changing states. The architecture is directly motivated by the implementation in Thakkar et al "Hierarchical Control for Cooperative Teams in Competitive Autonomous Racing" in which the authors employ a high-level, low-frequency discrete planner with a low-level, high-frequency continuous controller (see Figure below).
Specifically, this project implements a Monte-Carlo Tree Search (MCTS) to plan high-level robot behaviors, and a Nonlinear Constrained Optimal Controller to send motion control commands. These elements are discussed in detail in the following section.
The main system elements interact as shown in the diagram below.
This repository retains the testbed and hierarchical controller as submodules, while the hierarchical controller receives data from & sends commands to the testbed. Please see the following subsections for additional information on the testbed & hierarchical controller.
The testbed is implemented in ROS Gazebo and allows controlled, rapid development of the hierarchical planning & control architecture. It is intended to be a stand-in for real robot hardware until the hierarchical controller can safely be deployed on robot hardware. As such, it provides simulated sensor data and can accept robot motion commands via standard ROS message types.
At present, the only level is a simulated cargo pickup/dropoff task for a PR2 robot, taking place in an indoor hospital environment with known map. There are two variants, worlds/hospital.world
and worlds/hospital_empty.world
, which respectively do and do not contain humans.
The map can be launched by following the instructions in the Usage section and publishes relevant information about the PR2 on the following ROS topics:
/base_pose_ground_truth # Robot odometry as a nav_msgs/Odometry message
/base_scan # Laser ranging data as a sensor_msgs/LaserScan message
The PR2 can be commanded to move by publishing a geometry_msgs/Twist
message to the following topic:
/base_controller/command
The game theoretical, hierarchal planner-controller is implemented in the hierarchal_game_control_ros package. The high-level planner and low-level controller work in tandem to interpret sensor measurements, select an optimal plan of action, and send commands to the (simulated) robot. At present the system is designed for a single task: cargo pickup and delivery. However, the software could be extended to work in additional applications (see Future Work).
The low level controller node has three inputs:
- The robot's current state
$x$ , comprised of its 2D position$[x_{robot}, y_{robot}]$ , heading$\theta$ , and velocity$v$ . This information is received through anav_msgs/Odometry
ROS message on the/base_pose_ground_truth
topic and necessary conversions are made by robotOdomCallback in lowlevel_controller_node.h. - A 2D navigation goal
$[x_{goal}, y_{goal}]$ , in the map frame.$[x_{goal}, y_{goal}]$ is received as amove_base_msgs/MoveBaseGoal
message on the/nav_goal
topic. - LiDAR range data from the laser scanner, received as a
sensor_msgs/LaserScan
ROS message on the/base_scan
topic. The range measurements are converted from the robot's frame into themap
frame and saved by the scanCallback function in lowlevel_controller_node.h.
The ROS subscribers are created in lowlevel_controller_node.cpp
Given the inputs above, the controller computes the optimal control output geometry_msgs/Twist
ROS message on the /base_controller/command
topic by a publisher in lowlevel_controller_node.cpp
.
The controller is parametrized by cost weights
The low level controller models the control problem as a nonlinear program comprised of a cost function to be minimized by adjusting variables subject to constraints. Optimizer Interface (IFOPT), a C++/Eigen-based interface to solvers such as IPOPT and SNOPT, then solves the constrained optimization problem.
The low level controller minimizes the following cost functions:
which penalize distance from the navigation goal, large control inputs, and proximity to obstacles, respectively. This cost structure is derived from Lasse Peters et al, "Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations",
who use a similar cost structure for an autonomous driving scenario. Notably, the obstacle cost used in this controller is modified; the solver_params.h
, which is included in the controller node.
To optimize the cost functions, the solver adjusts the controller inputs
where solver_params.h
.
Subject to the following dynamics constraints and variable limits:
where solver_params.h
, while state and input variable limits are part of their respective variable set entries. Additionally, solver_params.h
includes analytical Jacobians for the cost and dynamics functions. These were derived manually and included to improve real-time solver performance, but are not strictly necessary.
Lastly, these constraints are added to the lowlevel_controller_node.cpp
and run in a continuous, high-frequency loop. During each iteration, current state variables are updated with the most recent ROS state data and the solver object computes a solution to the nonlinear program, publishing it to the robot testbed.
The planner node is a ROS node which, given the current state of the environment (game) and robot agent (player), computes the player's optimal move. The planner node then sends the action to the robot controller, as appropriate. The planner logic is implemented in a separate header file (graph_datatypes.h
), which contains the Monte Carlo Tree Search implementation and associated data structures (GameState
/node and Tree
). The main ROS node is implemented in highlevel_planner_node.cpp
and highlevel_planner_node.h
. Since the project is currently simulated, the ROS node also handles some game logic at an abstract level. For example, this node sets boolean values to denote if the robot is carrying cargo rather than do a dynamic simulation of the action.
The planner node's inputs consist of the variables that define the state of the robot and cargo within the environment:
- The robot's current 2D position
$[x_{robot}, y_{robot}]$ . This information is received through anav_msgs/Odometry
ROS message on the/base_pose_ground_truth
topic and necessary conversions are made by robotOdomCallback in highlevel_planner_node.h. - The 2D location where the cargo will be picked up,
$[x_{cargo}, y_{cargo}]$ . Currently hardcoded inhighlevel_planner_node.h
but should eventually become an input. - The 2D location where the cargo will be delivered,
$[x_{dest}, y_{dest}]$ . Currently hardcoded inhighlevel_planner_node.h
but should eventually become an input. - Boolean variables depicting the current location of the cargo:
$robotHasCargo, cargoAtPickup, cargoAtDest$ . Currently hardcoded inhighlevel_planner_node.h
but should eventually become an input.
The planner node publishes the optimal robot action after running MCTS. Actions can be output in one of two ways:
- Navigation goals
$[x_{goal}, y_{goal}]$ are published as amove_base_msgs/MoveBaseGoal
message on the/nav_goal
topic by a publisher inhighlevel_planner_node.cpp
, in themap
frame. - Actions such as
PickUp
orDropOff
are set as variables and processed internally byhighlevel_planner_node.cpp
. Since these capabilities are handled abstractly (i.e. no pickup or drop off actions are simulated in the testbed), these actions can be replaced by actual movement commands in hardware implementations of the planner. - To help interpret the planner's intent during development, the planner also publishes an OpenCV image containing the map, current state belief, and the action selected.
- A map of the operating environment. By default, the planner loads the map on startup and converts it to an OpenCV matrix for easier processing. The map is an occupancy grid, obtained by a mapping run of the environment and saved for later use. Sample maps of the hospital level are provided in
hri_game_testbed/maps/*.pgm
.clean_hospital_map
is used by default. - ROS map parameters: the map origin location, free/occupied threshold values, grid resolution are provided in
hri_game_testbed/maps/*.yaml
and enable the spatial and occupancy information to be properly interpreted. The parameters are discussed on the ROS wiki. - Solver time $dt$: Amount of time between the ROS timer call/amount of time to allow the Monte Carlo Tree Search to run. Currently set to 3 seconds.
- Cargo distance: the threshold beneath which the robot is considered close enough to pick up or drop off cargo from the map.
-
movementStepSize
: how much the planner advances the robot's position during a "move" action. Currently 1 meter.
highlevel_planner_node
is the main ROS node which runs the planner in a low-frequency loop (currently highlevel_planner_node
:
- Creates a new
GameState
node using the current state information. - Creates a new
Tree
with the current state as root. - Runs a search on the
Tree
object. - Renders a visual representation of the map, current state, and optimal move in an OpenCV display window.
These steps are implemented in the main loop in
highlevel_planner_node.cpp
.
The MCTS algorithm and solver-specific datatypes are included in include/graph_datatypes.h
:
The game state is completely characterized by:
- Pixel indices of the robot, cargo pickup site, and dropoff site. This includes the
robotXPix
,robotYPix
,pickupXPix
,pickupYPix
,destXPix
, anddestYPix
variables. Note that these positions are converted between pixel indices in the OpenCV occupancy grid and$[x,y]$ positions in themap
frame. - Current location of the cargo. This is defined by the boolean variables
robotHasCargo
,cargoAtPickup
, andcargoAtDest
.
Each GameState
node must also retain the following information for MCTS:
nTimesVisited
: The number of times the node has been simulated during the tree search.score
: The total objective result from all simulations at the node.parent
: TheGameState
which preceded the current state.
When a GameState
node is initialized, the getAvailMoves()
subroutine computes the available actions (moves) as follows:
- If the robot is within
cargoDist
of the cargo pickup location and the pickup location has the cargo, the robot can pick up cargo. - If the robot is within
cargoDist
of the cargo delivery destination and the robot has the cargo, the robot can drop off cargo. - To check for movement availability, pixels along the
PlusX
,PlusY
,MinusX
,MinusY
directions in themap
frame are checked to ensure they are below thefreeThreshold
. If each pixel value between the robot's current position and themovementStepSize
is free, this is a valid movement action.
The function propagate(Move)
accepts a Move as input and advances the GameState as follows:
- Movement Moves (
PlusX
,PlusY
,..) increment the robot's position (robotXPix
,robotYPix
) appropriately. PickUp
andDropOff
moves set the cargo boolean variablesrobotHasCargo
,cargoAtPickup
, andcargoAtDest
appropriately.
propagate(Move)
is also used during MCTS simulation.
The following terminal conditions are defined in the gameResult
function:
- Cargo successfully arriving at the destination results in a win and a score of +1.
- If the simulated game executes
maxNumMoves=50
moves without delivering the cargo, this results in a loss and a score of 0.
If the game state is not terminal, a value of -1
is returned to continue simulation.
The MCTS implementation is a straightforward implementation of the original algorithm by Coulom et al, "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search" (2007), and uses the Upper Confidence bounds applied to Trees (UCT) algorithm devised by Koscis & Szepesvári in "Bandit based Monte-Carlo Planning" (2006). For this project, MCTS is implemented in Tree.Search()
. The steps are:
- Initialization: The search begins at the current state, which is initially set as the root of the tree.
- Selection: The current node selects a child node with the highest value according to the UCT formula until a leaf is found. This is implemented in
GameState.selectNextNode()
and computes children's UCT values usingGameState.computeChildUctValue
. In the event that multiple child nodes have equal value, a random child node is selected from the highest valued children.
Recall that a leaf is a node with unexplored child nodes, computed by the helper function GameState.isLeaf()
, and the UCT formula computes the child node value as follows:
where score
, nTimesVisited
, nTimesVisited
, and
- Expansion: Once a leaf is found, a child node is added to the leaf. A valid, unexecuted
Move
is selected. A specializedGameState
constructor creates the child node from theGameState
parent and the selectedMove
. - Simulate: Random, uniform moves are selected from the child's set of available moves until a terminal state is reached. When a terminal state is reached, the simulate step add's the resulting score to the child node's
score
and increments the child node'snTimesVisited
. - Backpropagate: The search moves to successive parent nodes, updating
score
and incrementingnTimesVisited
for each node until the search reaches the root.
The search starts again at the selection step until the MCTS exceeds planning time Tree.bestMove
. To exploit the best known move, Tree.selectBestMove()
selects the root's child with the highest value
Qualitative results and observations from the initial system demonstration are provided here.
During initial trials, the low level controller took between 10-40 ms to solve the nonlinear control program, corresponding to a rate of 25-100 Hz.
The system is able to successfully navigate to a specified waypoint goal using the included objective functions:
When a navigation goal behind the robot is supplied, the robot moves backwards toward the goal, instead of rotating to face it and then moving forward:
When the robot moves into a more congested region of the map, the solver converges; however, the optimal solution is not the target goal waypoint. The solution seems to place the robot into a static configuration where it is unable to exit and make progress toward the goal:
During initial trials, the MCTS planner simulated a total of 50,000 - 100,000 nodes during the 3 second planning period. Typical scores at the root node ranged from 0-50.
When integrated with the simulator and controller, the planner is able to compute and send commands to the system effectively. The simulated robot's controller responds to the planner's waypoints. The planner does not appear to converge on a single move, and instead appears to issue commands inconsistently and in opposition to one another. For example, in the animation below, the planner commands the robot back and forth, resulting in minimal progress toward picking up or dropping off cargo.
The initial results of this project provided valuable feedback toward the system's design and guide further development. Specific conclusions and future work include:
- The planner-controller system is feasible for real-time usage. Both the controller and planner could run continuously and perform a suitable number of computations for real-time usage. A mobile robot platform possesses the computational resources required to run the controller-planner.
- Modifications to the cost functions in the optimal controller. The robot's motion displayed two undesirable characteristics: it backed toward goals instead of rotating to drive forwards toward them; and the robot base stopped when entering a congested area. This suggests that the controller has converged onto a solution that is locally optimal, but that does not reflect desired motion. To address these issues, a penalty could be implemented for rearward motion. The vehicle obstacle avoidance model from Peters et al likely needs to be replaced with an obstacle avoidance model designed for a robot with LiDAR scanning rangefinder.
- Modifications to the MCTS planner. The planner failed to converge on a single solution during planning runs, and published seemingly contradictory actions. This could be a result of a very sparse reward structure in which the
score
is much less than the total number of simulations,nVisited
. Specifically:
The exploration term in the UCT value is much larger than the exploitation term, which could explain the seemingly random behavior in the planner. To address the issue, modifications to the rewards, actions, and search algorithms should be explored. Namely:
- The objective score from a win can be increased
- The exploration constant
$c$ can be adjusted - The search can be guided towards known, valuable actions using a heuristic function (as in the
$A*$ search), or using a model for the value - Instead of planning the next action at each timestep, an optimal sequence of actions could be planned upon request. This may be more suitable
In conclusion, although the system did not affect the desired series of robot actions, a game-theoretical approach to robot action planning and execution is feasible. Modifications to the planning and controller algorithms implemented in this project could yield desirable behaviors for robots in dynamic environments.
As of December 2022, this repo is meant to be a proof of concept and development environment. Based on the initial project results, potential improvements to the project include:
Controller Algorithm improvements:
- Explore different obstacle avoidance models using LiDAR
- Add penalty for rearward motion
Planner Algorithm improvements:
- Experiment with increasing score for wins
- Add incremental reward/score, e.g. an intermediate goal for picking up cargo
- Guide the search using a heuristic function or model
- Reduce the number of planning steps and intermediate goals by planning entire navigation segments, instead of the next action
Code/implementation improvements:
- Launch localization and mapping; use actual ROS data instead of using the map as an OpenCV matrix
- Implement the optimal controller as a
ros_control
interface/controller - Separate
graph_datatypes
from thehighlevel_planner_node
and make it more generic and modular; currently specific to the cargo pickup/dropoff problem - Update access modifiers in the hl, ll, and mcts classes; everything is currently public
This assumes you have ROS Noetic installed and are using Ubuntu 20.04.
First, clone the repository and its submodules. At a terminal in your home directory (~
):
git clone --recursive TODO
Install ROS package dependencies. In the game_theory_ws
directory at a terminal:
rosdep install --from-paths . --ignore-src -r -y
Set up environmental variables for Gazebo. At a terminal:
cd ~/game_theory_ws/src/hri_game_testbed
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:`pwd`/models
export GAZEBO_RESOURCE_PATH=$GAZEBO_RESOURCE_PATH:`pwd`/worlds
Build the workspace. At a terminal:
cd ~/game_theory_ws
catkin_make # or catkin build
Ensure the workspace is properly sourced, and then run the demo using the following commands:
cd ~/game_theory_ws
source devel/setup.bash
roslaunch hierarchical_game_control_ros hospital_demo.launch # For an empty hospital environment
roslaunch hierarchical_game_control_ros hospital_demo.launch level:=hospital # For a populated environment