title

abstract

keywords

layout

series

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

publisher

container-title

volume

genre

issued

pdf

extras

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

Model-based Policy Search, Exploration, Sparse Reward

inproceedings

Proceedings of Machine Learning Research

kaushik18a

0

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

839

855

839-855

839

false

Kaushik, Rituraj and Chatzilygeroudis, Konstantinos and Mouret, Jean-Baptiste

given	family
Rituraj	Kaushik

given	family
Konstantinos	Chatzilygeroudis

given	family
Jean-Baptiste	Mouret

2018-10-23

PMLR

Proceedings of The 2nd Conference on Robot Learning

87

inproceedings

date-parts

2018

10

23

http://proceedings.mlr.press/v87/kaushik18a/kaushik18a.pdf

label	link
Supplementary video	https://youtu.be/9ZLwUxAAq6M

label	link
Source code	https://github.com/resibots/kaushik_2018_multi-dex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018-10-23-kaushik18a.md

2018-10-23-kaushik18a.md

Files

2018-10-23-kaushik18a.md

Latest commit

History

2018-10-23-kaushik18a.md

File metadata and controls