abstract

booktitle

title

year

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an $\tilde{O}(HS\sqrt{ZT})$ regret bound, where $T$ is the the total time steps, $H$ is the episodic horizon, and $S$ is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions ($A$), but only scales with a causal graph dependent quantity $Z$ which can be exponentially smaller than $A$. By extending C-UCBVI to the factored MDP setting, we propose the causal factored UCBVI (CF-UCBVI) algorithm, which further reduces the regret exponentially in terms of $S$. Furthermore, we show that RL algorithms for linear MDP problems can also be incorporated in C-MDPs. We empirically show the benefit of our causal approaches in various settings to validate our algorithms and theoretical results.

First Conference on Causal Learning and Reasoning

Efficient Reinforcement Learning with Prior Causal Knowledge

2022

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

lu22a

0

Efficient Reinforcement Learning with Prior Causal Knowledge

526

541

526-541

526

false

Lu, Yangyi and Meisami, Amirhossein and Tewari, Ambuj

given	family
Yangyi	Lu

given	family
Amirhossein	Meisami

given	family
Ambuj	Tewari

2022-06-28

Proceedings of the First Conference on Causal Learning and Reasoning

177

inproceedings

date-parts

2022

6

28

https://proceedings.mlr.press/v177/lu22a/lu22a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-06-28-lu22a.md

2022-06-28-lu22a.md

Files

2022-06-28-lu22a.md

Latest commit

History

2022-06-28-lu22a.md

File metadata and controls