The framework for numerical experiments to simulate the multi-armed bandit in the stochastic stationary environment with delays.
Evaluation of the adapted to delays policies using the publicly available dataset The International Stroke Trial. See this notebook for the analysis and simulation.
Provides the framework for numerical experiments to simulate the multi-armed bandit problem in the stochastic stationary environment with delays. Part of the paper Bernoulli multi-armed bandit problem under delayed feedback (Journal).
Structure of the project and currently implemented algorithms:
Files | |
---|---|
Environments | Protocol |
Bernoulli MAB | |
Policies | Protocol |
Uniform Random | |
Explore-First | |
Epsilon-Greedy | |
Upper Confidence Bound | |
Thompson Sampling (Beta distribution) | |
Experiments | Bernoulli MAB under delayed feedback |
Tests | Test module |
To run experiments on Bernoulli MAB see
python delayed_bandit/experiments.py --help
One might want to run a significant number of experiments and aggregate the result by removing outliers and averaging. The sampling of delays might be fixated over the horizon.
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
./pychecks.sh
MIT License
Copyright (c) 2023 Andrii Dzhoha