BiomedGPT

BiomedGPT is pre-trained and fine-tuned with multi-modal & multi-task biomedical datasets. Details of used datasets are shown in datasets.md. If you have any questions, feel free to contact us or post issues.

Please check out this Colab notebook for Fairseq-free inference. Warning: Extensive experiments using transformers have not been conducted, so we cannot confirm whether the results from transformers and fairseq are fully aligned.

Checkpoints

We provid pretrained checkpoints of BiomedGPT (Dropbox), which can be put in the scripts/ folder for further development. For finetuned checkpoints, please refer to checkpoints.md.

Note:

We emphasize that BiomedGPT, including its files, code, and checkpoints, is strictly for academic research purposes. Commercial and clinical uses are strictly prohibited for three key reasons: First, BiomedGPT is based on the OFA framework, which carries a non-commercial license that we have inherited. Second, our model is not licensed for use in healthcare settings. Finally, we have not implemented sufficient security measures, and the current model cannot guarantee the accuracy required for medical diagnoses.

Installation

git clone https://github.com/taokz/BiomedGPT
conda env create -f biomedgpt.yml
python -m pip install pip==21.2.4
pip install fairseq

Implementation

We provide the preprocessing, pretraining, finetuning and inference scripts in the scripts/ folder. You can follow the directory setting below:

BiomedGPT/
├── checkpoints/
├── datasets/
│   ├── pretraining/
│   ├── finetuning/
│   └── ...
├── scripts/
│   ├── preprocess/
│   │   ├── pretraining/
│   │   └── finetuning/
│   ├── pretrain/
│   ├── vqa/
│   └── ...
└── ...

Pretraining

Please follow datasets.md to prepare pretraining datasets, which includes 4 TSV files: vision_language.tsv, text.tsv, image.tsv and detection.tsv in the directory of ./datasets/pretraining/.

cd scripts/pretrain
bash pretrain_tiny.sh

Feel free to modify the hyperparameters in the bash script for your requirements or ablation study.

Downstreams

We provide the run scripts of fine-tuning and inference. There will be log files during execution. Before fine-tuning or inference, please refer to

Visual Question Answering

cd scripts/vqa
# for fine-tuning
bash train_vqa_rad_beam.sh
# for inference
bash evaluate_vqa_rad_beam.sh

Image Captioning

cd scripts/caption
# for fine-tuning
bash train_peir_gross.sh
# for inference
bash evaluate_peir_gross.sh

Text Summarization

cd scripts/text_sum
# for fine-tuning
bash train_meqsum.sh
# for inference
bash evaluate_meqsum.sh

Natural Language Inference

cd scripts/mednli
# for fine-tuning
bash train_mednli.sh
# for inference
bash evaluate_mednli.sh

Image Classification

cd scripts/image_cls
# for fine-tuning: I provide a template, please set different hyparameters for each dataset in MedMNIST if required.
bash train_medmnist.sh 
# for inference: a template
bash evaluate_medmnist.sh

Related Codebase

Citation

If you use BiomedGPT model or our code for publications, please cite 🤗:

@article{zhang2024generalist,
  title={A generalist vision--language foundation model for diverse biomedical tasks},
  author={Zhang, Kai and Zhou, Rong and Adhikarla, Eashan and Yan, Zhiling and Liu, Yixin and Yu, Jun and Liu, Zhengliang and Chen, Xun and Davison, Brian D and Ren, Hui and others},
  journal={Nature Medicine},
  pages={1--13},
  year={2024},
  publisher={Nature Publishing Group US New York}
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
criterions		criterions
data		data
datasets/finetuning		datasets/finetuning
examples		examples
fairseq		fairseq
models		models
module		module
scripts		scripts
tasks		tasks
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biomedgpt.yml		biomedgpt.yml
biomedgpt_py39_cuda118.yml		biomedgpt_py39_cuda118.yml
checkpoints.md		checkpoints.md
datasets.md		datasets.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiomedGPT

Checkpoints

Note:

Installation

Implementation

Pretraining

Downstreams

Related Codebase

Citation

About

Releases

Packages

Languages

License

Sulam-Group/BiomedGPT

Folders and files

Latest commit

History

Repository files navigation

BiomedGPT

Checkpoints

Note:

Installation

Implementation

Pretraining

Downstreams

Related Codebase

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages