abstract | booktitle | title | year | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an |
First Conference on Causal Learning and Reasoning |
Efficient Reinforcement Learning with Prior Causal Knowledge |
2022 |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
lu22a |
0 |
Efficient Reinforcement Learning with Prior Causal Knowledge |
526 |
541 |
526-541 |
526 |
false |
Lu, Yangyi and Meisami, Amirhossein and Tewari, Ambuj |
|
2022-06-28 |
Proceedings of the First Conference on Causal Learning and Reasoning |
177 |
inproceedings |
|