A survey and benchmark of High-Dimensional Bayesian Optimization for discrete sequence optimization

This repository contains the code for our survey and benchmark of high-dimensional Bayesian optimization of discrete sequences using poli and poli-baselines.

Checking ongoing results

Check our leaderboards in our project website.

Adding a new solver

Adding necessary files

We expect contributions to this benchmark to be implemented as solvers in poli-baselines. Follow the documentation therein.

In a few words, we expect you to provide the following folder structure:

# In poli-baselines' solvers folder
solvers
├── your_solver_name
│   ├── __init__.py
│   ├── environment.your_solver_name.yml
│   └── your_solver_name.py

We expect environment.your_solver_name.yml to create a conda environment in which your_solver_name.py could be imported. See a template here:

name: poli__your_solver_name
channels:
  - defaults
dependencies:
  - python=3.10
  - pip
  - pip:
      - your
      - dependencies
      - here
      - "git+https://github.com/MachineLearningLifeScience/poli.git@dev"
      - "git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main"

Provide said code as as a pull request to poli-baselines. Afterwards, we will register it, run it, and add its reports to our ongoing benchmarks.

(Optional) Running your solver locally

If you feel eager to test it in our problems, you could prepare for local testing here. We provide a requirements.txt/environment.yml you can use to create an environment for running the benchmarks. Afterwards, install this package:

conda create -n hdbo_benchmark python=3.10
conda activate hdbo_benchmark
pip install -r requirements.txt
pip install -e .

Change the WANDB_PROJECT and WANDB_ENTITY in src/hdbo_benchmark/utils/constants.py.

After implementing a solver in poli-baselines, you can register it in src/hdbo_benchmark/utils/experiments/load_solvers.py.

The scripts used to run the benchmarks can be found in src/hdbo_benchmark/experiments. To run e.g. albuterol_similarity of the PMO benchmark you can run:

conda run -n hdbo_benchmark python src/hdbo_benchmark/experiments/benchmark_on_pmo/run.py \
    --function-name=albuterol_similarity \
    --solver-name=line_bo \
    --latent-dim=128 \
    --max-iter=300 \

assuming hdbo_benchmark is an environment in which you can run your solver, and in which this package is installed. Examples of environments where solvers have been tested to run can be found in poli-baselines.

Replicating the data preprocessing for downloading zinc250k

We use torchdrug to download the dataset. It has very picky dependencies, but you should be able to install it by running

conda env create --file environment.data_preprocessing.yml

and following the scripts in src/hdbo_benchmark/data_preprocessing/zinc250k inside that env (conda activate hdbo__data_preprocessing).

Training a simple autoencoder for protein sequences

...TODO: write.

Citing all the relevant work

Depending on the black box you use within poli, we expect you to cite a set of references. Check the documentation of the black box for a list (including bibtex).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
data		data
envs		envs
src/hdbo_benchmark		src/hdbo_benchmark
workflows		workflows
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
batch.sh		batch.sh
batch_gpu.sh		batch_gpu.sh
environment.data_preprocessing.yml		environment.data_preprocessing.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run.py		run.py
setup.cfg		setup.cfg
test_main_entry_point.py		test_main_entry_point.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A survey and benchmark of High-Dimensional Bayesian Optimization for discrete sequence optimization

Checking ongoing results

Adding a new solver

Adding necessary files

(Optional) Running your solver locally

Replicating the data preprocessing for downloading zinc250k

Training a simple autoencoder for protein sequences

Citing all the relevant work

About

Releases

Packages

Contributors 2

Languages

MachineLearningLifeScience/hdbo_benchmark

Folders and files

Latest commit

History

Repository files navigation

A survey and benchmark of High-Dimensional Bayesian Optimization for discrete sequence optimization

Checking ongoing results

Adding a new solver

Adding necessary files

(Optional) Running your solver locally

Replicating the data preprocessing for downloading zinc250k

Training a simple autoencoder for protein sequences

Citing all the relevant work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages