This is the repository for a course project in Reinforcement Learning (CS747) by Kalpesh Krishna, Vihari Piratla and Varun Bhatt.
We implement five macro-action algorithms on the HFO framework to learn defensive strategies.
- Decision Interval SARSA - Located in
di_sarsa
, this algorithm repeats the same action for a fixed number of steps before a SARSA update and a change in action. - FiGAR SARSA - Located in
figar_sarsa
, this is a SARSA version of the Fine Grained Action Repetition framework, described in an ICLR 2017 paper. - Reward Regularization - Located in
reg_sarsa
, this algorithm penalizes a agent whenever it changes its action. This fixed penalty is directly deduced from the received reward. - Augmented Action Space - Located in
action_space_sarsa
, this algorithm is a SARSA version of an AAAI paper, Dynamic Action Repetition for Deep Reinforcement Learning. - Conditional FiGAR SARSA - Located in
conditional_figar_sarsa
, this algorithm modifies FiGAR SARSA and conditions the dynamic interval decision on the previously taken action.
This project uses the Half-Field Offense framework as a testbed. Make sure your system satisfies the requirements needed by this framework. Start in the root directory of this project.
git clone https://github.com/LARG/HFO
cd HFO
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=RelwithDebInfo ..
make -j4
make install
cd ../..
./build.sh
Set server_wait_seconds
to 100 in HFO/bin/teams/base/config/player.conf
.
If there is an error with rcssserver, clone mhauskn/rcssserver and install using
./configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
make
sudo make install
Then change line 67 in HFO/bin/HFO
from serverCommand = os.path.join(binary_dir, SERVER_BIN)
to serverCommand = SERVER_BIN
.
You can run the bash scripts in the schedulers
directory for each of the five algorithms. These scripts run a 2v2 scenario with one defense agent being trained. For example,
mkdir logs
./schedulers/figar_sarsa.sh
You will find the results in the newly created logs
directory.
Scripts to run a 3v3 scenario with two defense agents being trained can be found in schedulers3v3
directory.
You can find our final project report in report/report.pdf
. The empirical results have been added to report/report.org
and plots to report/plots/
.