Hierarchical Reinforcement Learning (with batteries included)

This repo replicates this hierarchy for a theoretical AV

An small video showing the final agent running in Navigation mode

For a full video comparing the performance to and the end-to-end model, see here:

Plug and play

To run the highest hierarchical model right out of the box do the following:

Install: follow the instructions of installation section.
Run python hrl/common/run_model.py --env Nav --policy Nav or change Nav in both arguments for Nav_n2n to see the non-hierarchical solution.
Run python hrl/common/run_model.py --env Original --policy Original to run a model very close to State of the art performance.

To see more options and different models you can run, check the [Using run_model](#Using run_model) section.

Installation

this repo mainly uses two big other repos, stable-baselines and CarRacing_v1 (not CarRacing_v0 from openai gym).

1. Install this repo

If you prefer, create a new conda environment and install tensorflow (conda create -n HRL tensorflow-gpu) and activate it (source activate HRL).
Clone this repo with git clone https://github.com/NotAnyMike/HRL and cd into it with cd HRL.
Install the repo with pip install -e ..
Install the requirements with pip install -r requirements.txt from the HRL repo.

2. Install CarRacing_v1

Clone CarRacing_v1 from https://github.com/NotAnyMike/gym with git clone https://github.com/NotAnyMike/gym.
cd in the repo with cd gym and install it with pip install -e ".[Box2D]"

If the environment is correctly installed, you should be able to play by running python car_racing.py using the arrows and space bar and quit by pressing Q.

3. Generate tracks

Inside the HRL repo, clone https://github.com/NotAnyMike/tracks with git clone https://github.com/NotAnyMike/tracks.
That's all you need. From here you can:
- You can check [Plug and play](## Plug and play) to see interesting models to run.
- Check [Using run_model](### Using run_model) to see what other type of models you can run
- Check [How to run experiments](### How to run experiments) to see what kind of experiments you can run.

How to run experiments

Running experiments supports logging and saving of all the experiment important information. Using the function run_experiment in the source folder. All the files will be saved in a folder inside the folder folder and will be named <id>_<tag> where <id> is the id in the id experiments table and <tag> is the tag argument.

Using `run_experiment`

The function run_experiment takes several different arguments, here it is a table of a summary of all of them

Argument	Type of value	Default value	Description
`env`	str	'Base'	The name of the class of the environment to use for the experiment
`train_steps`	int	1.000.000	The total number of steps to train for
`n_steps`	int	200	The number of steps to use in each training steps for the optimisation (similar to a buffer)
`gamma`	float	0.99	Discount factor to use in training
`max_steps`	int	None	The value of `max_steps` for the environment option
`save_interval`	int	10.000	Save the weights will save every `save_interval` steps
`weights`	str	None	Path of the weights to use
`n`	int	0	The number of steps from where to start counting the next steps (in case continuing training)
`not_save`	bool	False	Whether or not save and log the experiment
`folder`	str	'experiments'	The name of the folder where to save everything, cannot be empty
`env_num`	int	4	Number of parallel environment to use for training
`tag`	str	None	Tag to use in the name of the folder where everything about the experiment will be saved
`description`	str	None	A small description of the experiment. Will be saved in the csv file created

Running an experiment outside docker has the advantage of been able to activate the frame buffer to see how well the model is performing live and then deactivate the buffer again to make the training faster.

Running an experiment

Running

python hrl/common/run_experiment.py --folder experiments_folder --env_num 1

will run a simple experiment using the Base environment. You can change the environment with the --env parameter. Using hierarchical environments will consume more memory, the higher the model in the hierarchy the more models it is necessary to load, thus the more memory needed.

Existing Environments

Hierarchical	Non hierarchical	Description
	Base	Basic environment, nothing special
	Original	Simulates the original CarRacing_v0
	Keep_lane	To train agents capable of keeping the same lane and navigating efficiently
NWOO, NWOO_B/C:	NWOO_n2n, NWOO_B_n2n,	Navigation WithOut Obstacles
NWO	NWO_n2n	Navigation With Obstacles
Turn_v2	Turn_v2_n2n	Turn in an intersection to left or to right
X_v2	X_v2_n2n	X intersections (left,right,center)
	Take_center_v2	Knows how to take the middle lane in an intersection
	Turn_side_v2, Turn_right/left_v2	Knows how to take the left/right lane in an intersection
Change_lane_A/B	Change_lane_n2n	Change from the left to the right lane (and vice-versa)
	Change_to_left/right	Changes to one specific lane
	Recovery/_delayed/direct	In case of drift
Nav	Nav_n2n	The main model

You should be able to play any of the environments above. You should be able to play any non-hierarchical environment without any extra models or weights, but in order to play any hierarchical model you will need to have the options it is using available (which are included in this repo).

python hrl/envs/env.py -e Nav_n2n

The non-hierarchical environments uses normal arrow keys and space-bar. The hierarchical environments use numbers, so 1 will be the first option available to the model, etc. Hierarchical environments will wait for the input to render the next frame. You can also open a live view of the hierarchy by pressing B, you can closes by pressing B again.

Cancelling an experiment

if the experiment is not going well and it is not useful to save all that data, after doing ctrl+c you will be prompted whether or not you want to keep the data or remove it.

Docker

I suggest using docker in order to run experiments, makes the experiment consume less memory. The docker images are available in the docker folder. You can use the model creating a virtual frame buffer with xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" --. Docker does not work to watch the agent in the environment because there is no real frame buffer to render the environment.

Using Docker to run experiments

TODO: Complete readme, meanwhile you can use the bash file run_experiment_docker.sh in the root folder

Generating tracks

One of the most expensive operation is to generate maps. To alleviate this you can pre-regenerate as many maps as you want to avoid the model generating maps while training. This gives you the option of filtering the maps in which you want to train and speeds up training significantly.

Downloading generated maps

The easiest way is to clone https://github.com/NotAnyMike/tracks inside HRL. In the repo tracks checkout to one of the commits which has more maps, but the last commit should be enough

Generating tracks yourself

TODO: complete readme. Meanwhile there is a track_generator.py file inside hrl.common which can be used to generate tracks easily.

Using `run_model`

run_model runs the model specified in the environment specified. The only important argument to give is the weights of the model --full_path and the correct environment --env, that will run the saved weights with the environment. You can use instead of --full_path you can use --policy to define which policy of the hierarchy you wan to run.

python hrl/common/run_model.py --env Nav --policy Nav

run_model can also be used to measure the performance of the models and store them using tensorboard.

Parameters of `run_model`

Not all the following parameters are mandatory. Certain options require some other options to be specified.

Argument	Type	Default value	Description
`env`	str	Base	The name of the class of the environment to run on, if not specified Base is used.
`policy`	str	None	The name of the policy to load, it must have a folder in the `hrl.policies` folder and one weight inside, the newest will be run. if this option is given `-f`,`-e`,`-w`, `--full_path` are ignored
`full_path`	str	None	The full path of the weights, can be relative or absolute, if this is given then `-f`,`-e`,`-w` are ignored.
`folder`,`f`	str	'experiments'	The folder of the experiments, by default it is 'experiments', it is also used to save the logs if `-tb` is passed
`experiment`,`e`	str		The name of the folder of the experiment, the name of the experiment, for example '2_base'. Used to construct `full_path` if not given.
`weights`,`w`	str	None	The name of the weights to load, If not specified we run the last one or 'weights_final'.
`n_steps`,`n`	int	200	The number of steps to run for, if not specified then infinite
`n_ep`	int	0.99	The number of steps to run for, if not specified then infinite
`tensorboard`,`tb`	bool	True	A flag to register the score with tensorboard
`tag`,`t`	str	None	The tag for the folder in case of using tensorboard flag
`no_render`	bool	False	In case you want to log some info, but do not care about rendering in screen.

Policies

you can check what policies are available in hrl.weights, here is a list of the pre-defined policies available out of the box in this repo. The format is <id> (<small description>)

Hierarchical	Non hierarchical
	base
	CLeft/Right (Change left/right)
	D (direct recovery)
	De (delayed recovery)
	Keep_lane
Nav	Nav_n2n
Change_lane
NWO (Navigate With Obstacles)
NWOO (Navigate WithOut Obstacles)
Recovery	Recovery
	Take_center
	Turn_left/right
	Trun_n2n
X

Special non hierarchical policies

Original: a model which has a very high performance very very close to SOTA. To run this model use python hrl/common/run_model.py --env Original --p Original.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hierarchical Reinforcement Learning (with batteries included)

Plug and play

Installation

1. Install this repo

2. Install CarRacing_v1

3. Generate tracks

How to run experiments

Using `run_experiment`

Running an experiment

Existing Environments

Cancelling an experiment

Docker

Using Docker to run experiments

Generating tracks

Downloading generated maps

Generating tracks yourself

Using `run_model`

Parameters of `run_model`

Policies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hierarchical Reinforcement Learning (with batteries included)

Plug and play

Installation

1. Install this repo

2. Install CarRacing_v1

3. Generate tracks

How to run experiments

Using run_experiment

Running an experiment

Existing Environments

Cancelling an experiment

Docker

Using Docker to run experiments

Generating tracks

Downloading generated maps

Generating tracks yourself

Using run_model

Parameters of run_model

Policies

Using `run_experiment`

Using `run_model`

Parameters of `run_model`