In this project, we use the maximum entropy principle in Inverse reinforcement learning to learn soft constraints from demonstrations obtained from an agent interacting with a non-deterministic MDP. In the second part of this project, we implement various strategies (orchestrators) to mix conflicting policies (e.g. pragmatic vs ethical). In one …
-
Updated
Jan 13, 2022 - Python