GitHub - Feng-Sir0/Medical_X-VL: Initial commit

: Code for "Self-supervised Co-learning of Uncurated Images and Reports Enables Oversight AI in Radiology"

Medical X-VL: Medical Domain X-attention Vision-Language model

Paper link: https://arxiv.org/abs/2208.05140

[Paper] | Official Pytorch code

Medical X-VL: Medical Domain X-attention Vision-Language model

Medical X-VL is a vision-language pre-training model developed to be tailored for the intrinsic properties of the medical domain data. For demo, we provide python codes where you can vision-language pretrain, fine-tune and evaluate for each task, visualize the cross-attention between the words and visual semantics.

System requirements

General requirements

OS

Ubuntu 20.04

Software

Python 3.8 (tested on)
Conda
Pytorch 1.8.0 (tested on)
CUDA version 11.3 (tested on)

Hardware

CPU or GPU that supports CUDA CuDNN and Pytorch 1.8.
We tested on GeFore RTX 3090.
We recommend RAM of more than 32 GB.

Installation guide

Instruction

Install Pytorch and other dependencies. It can be easily installed with requirements.txt file.

>  pip install -r requirements.txt

Data preparation

Downloading data

The open-source datasets used in paper can be obtained from following links.

Dataset preparation

We follow the MedViLL to preprocess and split the MIMIC-CXR and VQA-RAD datasets. See this link for details.
COVID-19 and normal data can be downloaded in Brixia and NIH databases.

Other parts of the institutional data used in this study are not publicly available due to the patient privacy obligation. Interested users can request the access to these data for research, by contacting the corresponding author J.C.Y. ([email protected]).

Download pretrained weights

You can download the pretrained weights on the CheXpert dataset in link below, which should be located as,

VLP model for Chest radiographs

https://drive.google.com/file/d/1RKowiRjRCIj6WUlzhFsJsgaA33g9K9l2/view?usp=sharing

VLP model for abdominal radiographs

https://drive.google.com/file/d/1Y9uc_eVgp0irNE0BUka9_0qbY5urdS6_/view?usp=sharing

Training the model

Vision-Language Pre-training

First, download ImageNet-pretrained weights for the visual encoder from this link. We utilized pre-trained ViT-S/16 model as the visual encoder.

>  --config ./configs/Pretrain.yaml --output_dir ./output/

Image-Report retrieval

Our model support zero-shot retrieval for image-to-text and text-to-image retrieval without any fine-tuning step.

>  --config ./configs/Retrieval.yaml --output_dir ./output/ --checkpoint /PATH/TO/PRETRAIN/ --evaluate

Report Generation

From the VLP weights, the model can be fine-tuned for the report generation task as below.

>  --config ./configs/Generation.yaml --output_dir ./output/ --checkpoint /PATH/TO/PRETRAIN/

After fine-tuning, inference can be done as below.

>  --config ./configs/Generation.yaml --output_dir ./output/ --checkpoint /PATH/TO/FINETUNE/ --evaluate

Vision-Question Answering (VQA)

From the VLP weights, the model can be fine-tuned for the VQA task as below.

>  --config ./configs/VQA.yaml --output_dir ./output/ --checkpoint /PATH/TO/PRETRAIN/

After fine-tuning, inference can be done as below.

>  --config ./configs/VQA.yaml --output_dir ./output/ --checkpoint /PATH/TO/FINETUNE/ --evaluate

Error Detection

Human error (patient mismatch, orientation confusion) can be detected without any fine-tuning step, as the model is already trained to correlate the image and report in the pre-training stage.

>  --config ./configs/Detection.yaml --output_dir ./output/ --checkpoint /PATH/TO/PRETRAIN/ --evaluate

Visualization

Succesful visualization will show the cross-attention between the words and the visual semantics (image patches) as below.

>  --config ./configs/Pretrain.yaml --output_dir ./output/ --checkpoint /PATH/TO/PRETRAIN/ --evaluate

If you have any questions, please contact us via:

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
assets		assets
bleu		bleu
cider		cider
configs		configs
data		data
dataset		dataset
meteor		meteor
models		models
optim		optim
refTools		refTools
rouge		rouge
scheduler		scheduler
tokenizer		tokenizer
vqaTools		vqaTools
Detection.py		Detection.py
Finetune_correction.py		Finetune_correction.py
Generation.py		Generation.py
LICENSE		LICENSE
Losses.py		Losses.py
MaskBinarizer.py		MaskBinarizer.py
Pretrain.py		Pretrain.py
README.md		README.md
VQA.py		VQA.py
Visualization.py		Visualization.py
__init__.py		__init__.py
c.pt		c.pt
calculate_bleu.py		calculate_bleu.py
dataset_folder.py		dataset_folder.py
dcm2jpg.py		dcm2jpg.py
entire_pair_final.json		entire_pair_final.json
error_generator.py		error_generator.py
error_generator_openi.py		error_generator_openi.py
eval.py		eval.py
eval_function.py		eval_function.py
eval_zero.py		eval_zero.py
get_metrics.py		get_metrics.py
ibot_heads.py		ibot_heads.py
ibot_utils.py		ibot_utils.py
masking_generator.py		masking_generator.py
transforms.py		transforms.py
utils.py		utils.py
vit_configs.py		vit_configs.py
zero_error.py		zero_error.py
zero_error_mimic.py		zero_error_mimic.py
zero_shot.py		zero_shot.py
zero_shot_covid.py		zero_shot_covid.py
zero_shot_ensemble.py		zero_shot_ensemble.py
zero_shot_ensemble_covid.py		zero_shot_ensemble_covid.py
zero_shot_error.py		zero_shot_error.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

: Code for "Self-supervised Co-learning of Uncurated Images and Reports Enables Oversight AI in Radiology"

Medical X-VL: Medical Domain X-attention Vision-Language model

Paper link: https://arxiv.org/abs/2208.05140

[Paper] | Official Pytorch code

System requirements

General requirements

OS

Software

Hardware

Installation guide

Instruction

Data preparation

Downloading data

Dataset preparation

Download pretrained weights

VLP model for Chest radiographs

VLP model for abdominal radiographs

Training the model

Vision-Language Pre-training

Image-Report retrieval

Report Generation

Vision-Question Answering (VQA)

Error Detection

Visualization

If you have any questions, please contact us via:

About

Releases

Packages

Languages

License

Feng-Sir0/Medical_X-VL

Folders and files

Latest commit

History

Repository files navigation

: Code for "Self-supervised Co-learning of Uncurated Images and Reports Enables Oversight AI in Radiology"

Medical X-VL: Medical Domain X-attention Vision-Language model

Paper link: https://arxiv.org/abs/2208.05140

[Paper] | Official Pytorch code

System requirements

General requirements

OS

Software

Hardware

Installation guide

Instruction

Data preparation

Downloading data

Dataset preparation

Download pretrained weights

VLP model for Chest radiographs

VLP model for abdominal radiographs

Training the model

Vision-Language Pre-training

Image-Report retrieval

Report Generation

Vision-Question Answering (VQA)

Error Detection

Visualization

If you have any questions, please contact us via:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages