Memoire (pronounced "mem-wah-r") is a distributed replay memory for reinforcement learning. Industrial applications of reinforcement learning usually require large amount of computation, both for environment exploration and neural network training. Our goal is to make it easier to write high-performance distributed reinforcement learning algorithm.
The distributed reinforcement learning platform consists of two types of workers: Actors and Learners.
An actor is responsible for exploring the environment and generating data for the learners. In its main-loop, it works as
- Get latest model(policy) from learners.
- Act in the environment according to the current policy.
- Put generated experience in the client side of replay memory.
- Client push samples to the server.
An learner is responsible for updating the model with batch data. In its main-loop, it works as
- Get batch of samples from the server side of replay memory.
- Update model with batch samples, according to different algorithms.
- Publish latest model to actors.
We can distribute actors and learners in clusters (CPU and GPU) to fully utilize heterogeneous computing resources.
Actor | Learner | |
---|---|---|
Computing resource | CPU | GPU |
DNN operation | Forward | F/B |
Numbers | ~300 | ~10 |
Memory usage | ~10G | ~1G |
Bandwidth usage | ~1G | ~20G |
The client side of the replay memory stores recent trajectories generated by the local actor. The size of local replay memory is limited by total steps/transitions AND total episodes. We provide 3 methods to create space for a new episode, add a transition to the current episode, and close a terminated episode. When an episode is closed, the TD-lambda return for each step is calculated automatically, and its priority for sampling is updated. We also provide a method to sample current trajectories to form a cache, and push it to the learner.
The server side receives pushed caches from clients automatically. When we need batch of samples for training, we can get a batch from these pushed caches with another phase of sampling. The two phases of sampling at the client side and the server side is designed to be (roughly) equivalent to sampling from the whole replay memory across actors.
A complete list of supported methods and options can be found at API page.
Note that in this framework, only the sampled transitions instead of all the generated trajectories, are pushed to the learner for model updating. In the case that we have enormous number of actors, this kind of design can decrease both the bandwidth burden and memory usage of the learner. At the same time, the learner can still get the sample with high priority, and update the model efficiently by the flavor of prioritized sampling.
-
Prioritized Sampling
Prioritized experience replay [1] is a method of selecting high-priority samples for training. It is arguably the most effective technique for good performance of (distributed) reinforcement learning [2] [3].
-
Framework Independence
The replay memory module is separated from the training of neural network, thus making it independent of the deep learning framework used to implement the neural network (e.g. TensorFlow, PyTorch, etc.). We hope the modular design can provide more flexibility for deep learning practitioners.
-
Frame Stacking, N-Step Learning, Multidimensional Reward, TD-lambda return computing.
These are common and useful components of reinforcement learning in practice, which are implemented in our module for convenience.
See example/
The module is based on pybind11. We recommend to clone the latest version of pybind11 from github, and set the PYBIND11_ROOT
properly in Makefile.
We use the version 2.3.dev0
.
pip uninstall pybind11 # Remove old version
cd pybind11
pip install -e . # Install from source
or install from github (recommended)
pip install git+https://github.com/pybind/pybind11.git
We also use new features in google-protobuf. To install/update your protobuf to the latest version, you can install from source at protobuf-release with following commands.
See installation from source instructions in C++ Installation
yum erase protobuf # Remove old version
cd protobuf-3.6.1/
./configure CXXFLAGS=-fPIC # Compile with -fPIC
make -j # Compile
make install # Install system-wide
We support different version of python. You can choose your python version in Makefile
PYINC=$(PY27INC)
Then execute
make -j
The generated memoire.so
can be directly imported in python by
import memoire
ZeroMQ, google-test, libbfd (for debug).
yum install zeromq-devel binutils-devel gtest-devel
See API for reference.