A Deep Reinforcement Learning algorithm for Swarm Robotics

Thesis report: https://github.com/claytonfk/A_DRL_algorithm_for_SR/blob/master/Thesis.pdf

1. Requirements

To run the code, the following requirements should be met. The code has not been tested with earlier or later versions of software and libraries described below, thus forward and backward compatibilities are not guaranteed, however, they are expected.

The simulation software V-REP educational version 3.5.
A CUDA-enabled graphics processing unit.
The TensorFlow machine learning library for Python, version 1.3.0 as well as Python programming language, version 3.6.
Other necessary libraries are matplotlib (2.2.2), numpy (1.14.2) and scipy (1.1.0).
Windows 64bit, Linux 64bit or Mac OSX operating system.

2. Usage

As the first step before running the code - either the training or the evaluation code - it is necessary to execute V-REP and set port 19999 for the communication between the Python code and the software. This can be done by inputting the following command into the console:

Windows:      start vrep.exe -gREMOTEAPISERVERSERVICE_19999_FALSE_TRUE
Linux:       ./vrep.sh -gREMOTEAPISERVERSERVICE_19999_FALSE_TRUE 
Mac:         ./vrep.app/Contents/MacOS/vrep -gREMOTEAPISERVERSERVIC

2.1 Training

Before running the training file train.py, the user needs to define all options related to the training by editing the file and changing the values of its global variables. Such global variables are described in the following tables. The train.py file can be ran by executing the command "python train.py" in the console.

Restore options
Name of global variable	Description	Type
restore_model	Initialize the weights of the network from a saved model	boolean
restore_em	Initialize the experience memory from a saved results file	boolean
path_to_model_to_restore	Path to the saved model from which the weights of the network will be restored	string
path_to_results_to_restore	Path to the results file from which the experience replay memory will be restored	string

Save options
Name of global variable	Description	Type
save_model_frequency	Frequency (in episodes) of saving models	integer
max_to_keep	Limit of models to keep saved in disk. Older models are replaced with new ones once this limit is exceeded	integer
path_to_model_to_save	Path to the model file which will be saved during training	string
path_to_results_to_save	Path to the results file which will be saved at the end of the training	string

Stage options
Name of global variable	Description	Type
num_agents	Number of agents used for training	integer
num_episodes	Number of episodes for the stage	integer
max_discrepancy	Discrepancy above which the episode is finished	integer
min_discrepancy	Discrepancy below which the episode is finished	integer
steps_limit	Number of steps in each episode	integer
desired_distance	Desired distance l in meters	float
task	Task ID. 0: dispersion 1: square formation 2: aggregation 3: chain formation	integer

Experience replay memory and Boltzmann exploration and exploitation options
Name of global variable	Description	Type
em_capacity	Experience replay memory capacity	integer
alpha	Exponent alpha	float
beta	Initial value of the exponent beta	float
final_beta	Final value of the exponent beta	float
initial_b_temperature	Initial Boltzmann temperature	float
final_b_temperature	Final Boltzmann temperature	float

Network options
Name of global variable	Description	Type
training_frequency	Frequency (in time steps) at which the network is trained	integer
batch_size	Batch size used to train the network	integer
time_steps	Number of time steps used for the LSTM cell	integer
lstm_units	Number of units of which the LSTM cell is comprised	integer
num_neurons	Number of neurons of the multi-layer perceptron Example: [50, 50] means two layers containing 50 neurons each	list
copy_weights_frequency	Frequency (in time steps) at which the weights are copied from online network to target network	integer
discount_factor	Discount factor of the deep reinforcement learning algorithm	float
learning_rate	Learning rate of the optimization algorithm	float

3.2 Evaluation

Similarly, before running the evaluation.py file, it is necessary to edit the file by changing the global variables (table below) according to how the user chooses to perform the evaluation. After this has been done, the evaluation file can be run by executing "python evaluation.py" in the console.

Evaluation options
Name of global variable	Description	Type
path_to_model_to_restore	Path to the saved model from which the weights of the network will be restored	string
time_steps	Number of time steps used for the LSTM cell	integer
lstm_units	Number of units of which the LSTM cell is comprised	integer
num_neurons	Number of neurons of the multi-layer perceptron Example: [50, 50] means two layers containing 50 neurons each	list
num_agents	Number of agents used for evaluating	integer
num_episodes	Number of episodes for the evaluation	integer
max_discrepancy	Discrepancy above which the episode is finished	integer
min_discrepancy	Discrepancy below which the episode is finished	integer
steps_limit	Number of steps in each episode	integer
desired_distance	Desired distance $l$ in meters	float
task	Task ID. 0: dispersion 1: square formation 2: aggregation 3: chain formation	integer

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scenes		scenes
trainings		trainings
utils		utils
README.md		README.md
Thesis.pdf		Thesis.pdf
agent.py		agent.py
environment.py		environment.py
evaluation.py		evaluation.py
model.py		model.py
prioritized_EM.py		prioritized_EM.py
remoteApi.dll		remoteApi.dll
train.py		train.py
vrep.py		vrep.py
vrepConst.py		vrepConst.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Deep Reinforcement Learning algorithm for Swarm Robotics

1. Requirements

2. Usage

2.1 Training

3.2 Evaluation

About

Releases

Packages

Languages

claytonfk/A_DRL_algorithm_for_SR

Folders and files

Latest commit

History

Repository files navigation

A Deep Reinforcement Learning algorithm for Swarm Robotics

1. Requirements

2. Usage

2.1 Training

3.2 Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages