Mirror Descent for Gridworld MDP

The purpose of the code in this repository is to test various mirror descent stepping schemes for solving tabular MDPs.

There is a single tabular MDP implemented in GridworldMDP.py. In principle the algorithms later just need access to size of state and action spaces and transition probabilities, rewards and the discount factor, so it would be easy to abstract this.

Algorithms implemented:

Policy iteration algorithm (PIA) in PIA.py
Policy iteration algorithm on softmax policies in softmax_PIA.py
Value iteration algorithm on softmax policies in softmax_PIA.py
Mirror descent algorithms in mirror_descent.py:

vanilla explicit Euler stepping (which is precisely mirror descent)
midpoint stepping (2nd order method)
RK4 stepping (4th order method)

Running this:

To run the code, you will need to install Python 3, NumPy and Matplotlib. If you're using Poetry, you can just run poetry install.
Run main.py. It's purpose is to provide plots showing impact of various chosen stepping schemes on how mirror descent performs.

Results:

There are several experiments in main.py but currently it just runs those presented in the paper A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
GridworldMDP.py		GridworldMDP.py
PIA.py		PIA.py
README.md		README.md
main.py		main.py
mirror_descent.py		mirror_descent.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
softmax_PIA.py		softmax_PIA.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mirror Descent for Gridworld MDP

Algorithms implemented:

Running this:

Results:

About

Releases

Packages

Contributors 2

Languages

deterministicdavid/mirror_descent_for_gworld_mdp

Folders and files

Latest commit

History

Repository files navigation

Mirror Descent for Gridworld MDP

Algorithms implemented:

Running this:

Results:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages