This repository is a fork of the Nathan Sprague implementation of the deep Q-learning algorithm described in:
Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
and
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.
We use the DQN algorithm to learn the strategies for Atari games using the RAM state of the machine.
- A reasonably modern NVIDIA GPU
- OpenCV
- Theano (https://github.com/Theano/Theano)
- Lasagne (https://github.com/Lasagne/Lasagne
- Pylearn2 (https://github.com/lisa-lab/pylearn2)
- Arcade Learning Environment (https://github.com/mgbellemare/Arcade-Learning-Environment)
The script dep_script.sh
can be used to install all dependencies under Ubuntu.
We've done a number of experiments with models that use RAM state. They don't fully share the code, so we split them in branches. To re-run them, you can use our scripts, which are located in the main directory of the repository.
- just_ram - network that takes only RAM as inputs, passes it through 2 ReLU layers with 128 nodes each and scales the output to the appropriate size
- big_ram - the analogous network, but with 4 hidden layers
- mixed_ram - network taking both ram and screen as an input
- big_mixed_ram - deeper version of mixed_ram
- ram_dropout - the just_ram with applied dropout to all the layers except the output
- big_dropout - the big_ram network with dropout
Evaluation of a model using a different frame skip:
./frameskip.sh <rom name> <network type> <frameskip>, e.g:
./frameskip.sh breakout just_ram 8
We added dropout to the two ram-only networks. You can run it as:
./dropout.sh <rom name> ram_dropout
OR
./dropout <rom name> big_dropout
ram_dropout
is a network with two dense hidden layers, big_dropout
with 4.
You can try the models with l2-regularization using:
./weight-decay.sh <rom name> <network type>, e.g:
./weight-decay.sh breakout big_ram
The models with learning rate decreased to
./learningrate.sh <rom name> <network type>, e.g:
./learningrate.sh breakout big_ram
You need to put roms in the roms
subdirectory. Their names should be spelled with lowercase letters, e.g. breakout.bin
.
-
https://github.com/spragunr/deep_q_rl
Original Nathan Sprague implementation of DQN.
-
https://sites.google.com/a/deepmind.com/dqn
This is the code DeepMind used for the Nature paper. The license only permits the code to be used for "evaluating and reviewing" the claims made in the paper.
-
https://github.com/muupan/dqn-in-the-caffe
Working Caffe-based implementation. (I haven't tried it, but there is a video of the agent playing Pong successfully.)
-
https://github.com/kristjankorjus/Replicating-DeepMind
Defunct? As far as I know, this package was never fully functional. The project is described here: http://robohub.org/artificial-general-intelligence-that-plays-atari-video-games-how-did-deepmind-do-it/
-
https://github.com/brian473/neural_rl
This is an almost-working implementation developed during Spring 2014 by my student Brian Brown. I haven't reused his code, but Brian and I worked together to puzzle through some of the blank areas of the original paper.