Skip to content

Latest commit

 

History

History
53 lines (53 loc) · 2.17 KB

2022-06-28-zhang22a.md

File metadata and controls

53 lines (53 loc) · 2.17 KB
abstract booktitle title year layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Recent advances in Reinforcement Learning have allowed automated agents (for short, agents) to achieve a high level of performance across a wide range of tasks, which when supplemented with human feedback has led to faster and more robust decision-making. The current literature, in large part, focuses on the human’s role during the learning phase: human trainers possess a priori knowledge that could help an agent to accelerate its learning when the environment is not fully known. In this paper, we study an interactive reinforcement learning setting where the agent and the human have different sensory capabilities, disagreeing, therefore, on how they perceive the world (observed states) while sharing the same reward and transition functions. We show that agents are bound to learn sub-optimal policies if they do not take into account human advice, perhaps surprisingly, even when human’s decisions are less accurate than their own. We propose the counterfactual agent who proactively considers the intended actions of the human operator, and proves that this strategy dominates standard approaches regarding performance. Finally, we formulate a novel reinforcement learning task maximizing the performance of an autonomous system subject to a budget constraint over the available amount of human advice.
First Conference on Causal Learning and Reasoning
Can Humans Be out of the Loop?
2022
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
zhang22a
0
Can Humans Be out of the Loop?
1010
1025
1010-1025
1010
false
Zhang, Junzhe and Bareinboim, Elias
given family
Junzhe
Zhang
given family
Elias
Bareinboim
2022-06-28
Proceedings of the First Conference on Causal Learning and Reasoning
177
inproceedings
date-parts
2022
6
28