This repo represents my attempt to reproduce the DeepMind Atari playing agent described in the recent Nature paper.
While the DeepMind implementation is built in lua with torch7, this implementation uses TensorFlow. Like DeepMind, it also depends on the Arcade Learning Environment (technically I believe DeepMind uses their Xitari fork of ALE).
I have been focused on attempting to match DeepMind's performance on Space Invaders, which in their publication is 1976+/-800, though I do not know exactly how they compute those results. For my results I compute average/stdev over the final 20 evals of the training regime. I did a run with the DeepMind code (results here) and by this measure saw results of 1428+/189. My current results are far short at 1139+/-138 (random agent scores ~150). Thus far I have not found anyone that has reproduced the DeepMind results using the approach described in the Nature paper. If you've done it, particularly with TensorFlow, let me know!
I have also tried breakout and got a score of 284+/-78 but that was an older version with the wrong target network update frequency. (DeepMind reported 400+/-30 using their eval method).
I have also experimented with compressing experience replay to have larger capacity than 1M. Both breakout and space invaders show ~10% improvement with 4M and 3M respectively.
A publicly viewable google spreadsheet has results for various experiments I have run.
-
Get Python and Tensorflow running, preferably on a GPU (see notes on AWS setup).
-
Install the arcade learning environment (see wiki)
-
Install dqn-atari specific dependencies, currently just
sudo pip install blosc
-
Download a game rom, and name it properly like space_invaders.bin (all lower case ending in bin -- the names must match for ALE).
-
Get the repo:
git clone https://github.com/gtoubassi/dqn-atari.git
-
Run it! The default parameters attempt to mimic the Nature paper configuration:
cd dqn-atari python ./play_atari.py ~/space_invaders.bin | tee train.log
-
Periodically check progress
./logstats.sh train.log
The following were very helpful: