This repository contains the code for our survey and benchmark of high-dimensional Bayesian optimization of discrete sequences using poli and poli-baselines.
Check our leaderboards in our project website.
We expect contributions to this benchmark to be implemented as solvers in poli-baselines
. Follow the documentation therein.
In a few words, we expect you to provide the following folder structure:
# In poli-baselines' solvers folder
solvers
├── your_solver_name
│ ├── __init__.py
│ ├── environment.your_solver_name.yml
│ └── your_solver_name.py
We expect environment.your_solver_name.yml
to create a conda environment in which your_solver_name.py
could be imported. See a template here:
name: poli__your_solver_name
channels:
- defaults
dependencies:
- python=3.10
- pip
- pip:
- your
- dependencies
- here
- "git+https://github.com/MachineLearningLifeScience/poli.git@dev"
- "git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main"
Provide said code as as a pull request to poli-baselines. Afterwards, we will register it, run it, and add its reports to our ongoing benchmarks.
If you feel eager to test it in our problems, you could prepare for local testing here. We provide a requirements.txt
/environment.yml
you can use to create an environment for running the benchmarks. Afterwards, install this package:
conda create -n hdbo_benchmark python=3.10
conda activate hdbo_benchmark
pip install -r requirements.txt
pip install -e .
Change the WANDB_PROJECT
and WANDB_ENTITY
in src/hdbo_benchmark/utils/constants.py
.
After implementing a solver in poli-baselines
, you can register it in src/hdbo_benchmark/utils/experiments/load_solvers.py
.
The scripts used to run the benchmarks can be found in src/hdbo_benchmark/experiments
. To run e.g. albuterol_similarity
of the PMO benchmark you can run:
conda run -n hdbo_benchmark python src/hdbo_benchmark/experiments/benchmark_on_pmo/run.py \
--function-name=albuterol_similarity \
--solver-name=line_bo \
--latent-dim=128 \
--max-iter=300 \
assuming hdbo_benchmark
is an environment in which you can run your solver, and in which this package is installed. Examples of environments where solvers have been tested to run can be found in poli-baselines
.
We use torchdrug to download the dataset. It has very picky dependencies, but you should be able to install it by running
conda env create --file environment.data_preprocessing.yml
and following the scripts in src/hdbo_benchmark/data_preprocessing/zinc250k
inside that env (conda activate hdbo__data_preprocessing
).
...TODO: write.
Depending on the black box you use within poli
, we expect you to cite a set of references. Check the documentation of the black box for a list (including bibtex
).