Skip to content

Code for the paper "How RL Agents Behave When Their Actions Are Modified"

License

Notifications You must be signed in to change notification settings

edlanglois/mamdp

Repository files navigation

MAMDP Experiments

Code for the paper How RL Agents Behave When Their Actions Are Modified by Eric Langlois and Tom Everitt (AAAI 2021).

Install

pip install .

This installs the mamdp package along with several scripts prefixed by mamdp-.

Running Experiments

The following commands will reproduce the results described in the paper.

Run results are stored in the experiments/ directory and are re-used if available. If changing any parameters other than NUM_RUNS, make sure that experiments/ does not contain past runs.

Train and evaluate the Simulation-Oversight environment

make -j<NUM_CPU_CORES> simulation-oversight

Training curves are saved to experiments/simulation-oversight/training.png and can be plotted manually with:

mamdp-plot-evaluations experiments/simulation-oversight/

Summarize the policies

mamdp-summarize-policies experiments/simulation-oversight/*.policies.json

Train and evaluate the Small Whisky-Gold environment

make -j<NUM_CPU_CORES> NUM_RUNS=10 whisky-gold-small

Summarize the Small Whisky-Gold strategies

0 is the state index at the branch point between heading directly to the goal through the whisky (right; action = 3) or going around (down; action = 2)

mamdp-summarize-policies experiments/whisky-gold-small/*.policies.json --argmax --state 0 --actions 2 3

Probability that the policy visits a state. 11 is the index of the whisky

mamdp-plot-policies experiments/whisky-gold-small/*.eval.json --state 9

Train and evaluate the Off-Switch environment

Uses a fixed learning rate instead of 1/visit_count.

make -j<NUM_CPU_CORES> NUM_RUNS=10 off-switch

Summarize the Off-Switch strategies

11 is the state index at the branch point between detouring to the disable button (down; action = 2) or heading directly towards the goal (left; action = 1).

mamdp-summarize-policies experiments/off-switch/*.policies.json --argmax --state 11 --actions 1 2

Probability that the policy visits a state. 36 is the index of the off switch button state.

mamdp-plot-policies experiments/off-switch/*.eval.json --state 36

Development

Editable Install

python setup.py develop [--user]

Re-run this command to refresh the version number (based on git tags).

Testing

python -m pytest

Versioning

Uses Semantic Versioning.

Versions are set exclusively via git tags:

git -a v0.1.2 -m "Version 0.1.2"

About

Code for the paper "How RL Agents Behave When Their Actions Are Modified"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published