Introduction

This repository is a fork of the Nathan Sprague implementation of the deep Q-learning algorithm described in:

Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

and

Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

We use the DQN algorithm to learn the strategies for Atari games using the RAM state of the machine.

Dependencies

A reasonably modern NVIDIA GPU
OpenCV
Theano (https://github.com/Theano/Theano)
Lasagne (https://github.com/Lasagne/Lasagne
Pylearn2 (https://github.com/lisa-lab/pylearn2)
Arcade Learning Environment (https://github.com/mgbellemare/Arcade-Learning-Environment)

The script dep_script.sh can be used to install all dependencies under Ubuntu.

Running

We've done a number of experiments with models that use RAM state. They don't fully share the code, so we split them in branches. To re-run them, you can use our scripts, which are located in the main directory of the repository.

Network types

just_ram - network that takes only RAM as inputs, passes it through 2 ReLU layers with 128 nodes each and scales the output to the appropriate size
big_ram - the analogous network, but with 4 hidden layers
mixed_ram - network taking both ram and screen as an input
big_mixed_ram - deeper version of mixed_ram
ram_dropout - the just_ram with applied dropout to all the layers except the output
big_dropout - the big_ram network with dropout

Frame skip

Evaluation of a model using a different frame skip:

./frameskip.sh <rom name> <network type> <frameskip>, e.g:
./frameskip.sh breakout just_ram 8

Dropout

We added dropout to the two ram-only networks. You can run it as:

./dropout.sh <rom name> ram_dropout
OR
./dropout <rom name> big_dropout

ram_dropout is a network with two dense hidden layers, big_dropout with 4.

Weight-decay

You can try the models with l2-regularization using:

./weight-decay.sh <rom name> <network type>, e.g:
./weight-decay.sh breakout big_ram

Decreasing learning-rate

The models with learning rate decreased to $0.001$ can be run as:

./learningrate.sh <rom name> <network type>, e.g:
./learningrate.sh breakout big_ram

Roms

You need to put roms in the roms subdirectory. Their names should be spelled with lowercase letters, e.g. breakout.bin.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
deep_q_rl		deep_q_rl
roms		roms
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
README_OSX.md		README_OSX.md
dep_script.sh		dep_script.sh
dropout.sh		dropout.sh
frameskip.sh		frameskip.sh
learningrate.sh		learningrate.sh
unroll-frameskip.sh		unroll-frameskip.sh
weight-decay.sh		weight-decay.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Dependencies

Running

Network types

Frame skip

Dropout

Weight-decay

Decreasing learning-rate

Roms

See Also

About

Releases

Packages

Languages

License

sygi/deep_q_rl

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dependencies

Running

Network types

Frame skip

Dropout

Weight-decay

Decreasing learning-rate

Roms

See Also

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages