This package implements a series of Contextual MDPs based on DeepMind Control Suite. The original environments are modified such that the context of the MDP specifies the dynamics and rewards settings.
If you use our code, please cite our AAAI 2023 paper:
@article{rezaei2022hypernetworks,
title={Hypernetworks for Zero-shot Transfer in Reinforcement Learning},
author={Rezaei-Shoshtari, Sahand and Morissette, Charlotte and Hogan, Francois Robert and Dudek, Gregory and Meger, David},
journal={arXiv preprint arXiv:2211.15457},
year={2022}
}
- Install the following libraries needed for Mujoco and DeepMind Control Suite:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- We recommend using a conda virtual environment to run the code. Create the virtual environment:
conda create -n contextual_env python=3.9
conda activate contextual_env
pip install --upgrade pip
- Install Mujoco and DeepMind Control Suite following the official instructions.
- Clone this package and install its dependencies:
pip install -r requirements.txt
- Finally install the
contextual_control_suite
package withpip
:
pip install -e .
- A demo script showing how to use the contexts is available here.
- All environments are implemented based on the original DeepMind Control environments. Contexts of the MDP, either rewards or dynamics, are passed through a dictionary to the environment:
from contextual_control_suite import suite
# Reward parameters
reward_kwargs = {
'ALL': {
'sigmoid': 'linear',
'margin': 10,
},
}
# Dynamics parameters
dynamics_kwargs = {
'length': 0.5
}
task_kwargs = {
'reward_kwargs': reward_kwargs,
'dynamics_kwargs': dynamics_kwargs
}
# Create the environment with the custom task parameters
env = suite.load('cheetah', 'run', task_kwargs=task_kwargs)
- Note:
reward_kwargs
anddynamics_kwargs
are environment dependant, please see each environment for its specific parameters. - To train RL agents on a wide range of environments sampled from
contexual_dm_control
, see hyperzero.