You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue summarizes initial experiments with self-supervised (SSL) pre-training and fine-tuning using Vision Transformers (ViT). The experiments are based on this MONAI tutorial and code is available within branch jv/vit_unetr_ssl.
The idea is to do self-supervised pre-training on unlabeled images and then do supervised fine-tuning for a specific task, e.g., DCM lesion segmentation.
For simplicity, all experiments have been done so far as single-channel using only T2w contrast.
First, two augmented views are created for each original training image (see lines here). Then, the contrastive loss is used to bring the two augmented views closer to each other if the views are generated from the same patch; if not it tries to maximize the disagreement.
So far, I have used a spatial size of 64, 256, 256:
The pre-training (500 epochs, batch size of 2) on 236/29 train/val images (T2w resampled to 1mm iso) took ~50 hours on a single GPU on romane. I had to set number of workers to 0 due to RuntimeError: Pin memory thread exited unexpectedly. With a higher number of workers, the training would probably be faster.
Training & Validation Curves for pre-training SSL
Fine-tuning
The fine-tuning is done on dcm-zurich-lesion patients as a supervised task (i.e., providing T2w images and lesion labels) using a script vit_unetr_ssl/finetune.py. The pre-trained weights are loaded into UNETR model.
The text was updated successfully, but these errors were encountered:
valosekj
changed the title
SSL pre-training and fine-tuning using Vision Transformers
SSL pre-training and fine-tuning on (64, 256, 256) samples
Mar 24, 2024
This issue summarizes initial experiments with self-supervised (SSL) pre-training and fine-tuning using Vision Transformers (ViT). The experiments are based on this MONAI tutorial and code is available within branch jv/vit_unetr_ssl.
The idea is to do self-supervised pre-training on unlabeled images and then do supervised fine-tuning for a specific task, e.g., DCM lesion segmentation.
For simplicity, all experiments have been done so far as single-channel using only T2w contrast.
Pre-training
The pre-training is done on spine-generic multi-subject T2w images using ViTAutoEncscript model by script vit_unetr_ssl/train.py.
First, two augmented views are created for each original training image (see lines here). Then, the contrastive loss is used to bring the two augmented views closer to each other if the views are generated from the same patch; if not it tries to maximize the disagreement.
So far, I have used a spatial size of 64, 256, 256:
The pre-training (500 epochs, batch size of 2) on 236/29 train/val images (T2w resampled to 1mm iso) took ~50 hours on a single GPU on
romane
. I had to set number of workers to 0 due toRuntimeError: Pin memory thread exited unexpectedly
. With a higher number of workers, the training would probably be faster.Training & Validation Curves for pre-training SSL
Fine-tuning
The fine-tuning is done on
dcm-zurich-lesion
patients as a supervised task (i.e., providing T2w images and lesion labels) using a script vit_unetr_ssl/finetune.py. The pre-trained weights are loaded into UNETR model.The text was updated successfully, but these errors were encountered: