Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region-based nnUNet training on PSIR and STIR images #68

Open
plbenveniste opened this issue Oct 24, 2023 · 8 comments
Open

Region-based nnUNet training on PSIR and STIR images #68

plbenveniste opened this issue Oct 24, 2023 · 8 comments
Assignees

Comments

@plbenveniste
Copy link
Collaborator

In this issue, I detail the process used to training several region-based nnUNet :

  • a 3d_fullres region-based nnUNet trained on PSIR and STIR images
  • a 2d region-based nnUNet trained on PSIR and STIR images
  • a 3d_fullres region-based nnUNet trained on STIR images and PSIR images multiplied by -1 (only PSIR are multiplied by -1)
  • a 2d region-based nnUNet trained on STIR images and PSIR images multiplied by -1 (only PSIR are multiplied by -1)

Each model is trained on 5 folds.

The CanProCo dataset was split in a training and testing set (80% and 20% respectively). For the first two model, the dataset was formatted to the nnUNet format using convert_BIDS_to_nnunet.py. For the last two models, the dataset was formatted to the nnUNet format using convert_BIDS_to_nnunet_with_mul_PSIR.py so that the PSIR images are multiplied by -1.

There are 336 images for training and 89 images for testing. The split was done to have around 80% of each site in the training dataset and 20% of each site in the testing dataset

@plbenveniste plbenveniste self-assigned this Oct 24, 2023
@plbenveniste
Copy link
Collaborator Author

3d_fullres region-based nnUNet on STIR and PSIR images

The results are the following:

  • fold 0:
    - Pseudo dice [0.9429, 0.566] : Best EMA pseudo Dice: 0.7457
    This split has 268 training and 68 validation cases.
    Mean Validation Dice: 0.6730917778799717
  • fold 1:
    - Pseudo dice [0.9414, 0.5409] : Best EMA pseudo Dice: 0.7332
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.651732335911188
  • fold 2:
    - Pseudo dice [0.9312, 0.56] : Best EMA pseudo Dice: 0.7328
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.0
  • fold 3:
    - Pseudo dice [0.9131, 0.5324]: Best EMA pseudo Dice: 0.7124
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.46880420926269767
  • fold 4:
    - Pseudo dice [0.9322, 0.5886]: Best EMA pseudo Dice: 0.7455
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6747336332287965

@plbenveniste
Copy link
Collaborator Author

3d_fullres region-based nnUNet on STIR and mutliplied by -1 PSIR images

The results are the following:

  • fold 0:
    - Pseudo dice [0.9428, 0.5676] : Best EMA pseudo Dice: 0.755
    This split has 268 training and 68 validation cases.
    Mean Validation Dice: 0.6779295464497137
  • fold 1:
    - Pseudo dice [0.9353, 0.538] : Best EMA pseudo Dice: 0.7123
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.4774722938365922
  • fold 2:
    - Pseudo dice [0.9328, 0.5936] : Best EMA pseudo Dice: 0.7526
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6470207480843974
  • fold 3:
    - Pseudo dice [0.9391, 0.5824]: Best EMA pseudo Dice: 0.74
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6659129637572001
  • fold 4:
    - Pseudo dice [0.9439, 0.5875]: Best EMA pseudo Dice: 0.7625
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6889081888133888

@plbenveniste
Copy link
Collaborator Author

To compare the two models more thoroughly, we performed inference on the test set and computed metrics to compare them. To do so, we used :

  • CUDA_VISIBLE_DEVICES=2 nnUNetv2_predict -i {test_set} -o {output_folder} -d {model_ID} -c 3d_fullres --save_probabilities -chk checkpoint_best.pth -f {FOLD} : this allowed us to compute predictions
  • convert_predictions_to_BIDS.py : this converted the predictions back to the BIDS format
  • evaluate_lesion_seg_prediction.py : this evaluated each prediction using the Anima library
  • nnUNet_inference_analysis.ipynb : this notebook plots the results of the inference for each model and for each file

This analysis pushed us to choose the fold 2 of the model trained on STIR and multiplied by -1 PSIR. The following table shows the comparison in terms of performance:
Screen Shot 2023-10-23 at 8 47 31 PM

TO DO:

  • Evaluate the performances of both 2d nnUNet model (however, it already seems that performance are not as good)
  • Perform inference on M12 data

@plbenveniste
Copy link
Collaborator Author

Performance of 2d nnUNet model trained on PSIR and STIR images

Model: Dataset111_RegionBasedLesionSeg/nnUNetTrainer__nnUNetPlans__2d
The results are the following:

  • fold 0:
    - Pseudo dice [0.9311, 0.5816] : Best EMA pseudo Dice: 0.7513
    This split has 268 training and 68 validation cases.
    Mean Validation Dice: 0.6527715781119139
  • fold 1:
    - Pseudo dice [0.9226, 0.5727] : Best EMA pseudo Dice: 0.7396
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6398525521319649
  • fold 2:
    - Pseudo dice [0.9241, 0.5784] : Best EMA pseudo Dice: 0.7482
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6648134761121531
  • fold 3:
    - Pseudo dice[0.9296, 0.5914]: Best EMA pseudo Dice: 0.7515
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6349010124251724
  • fold 4:
    - Pseudo dice [0.9278, 0.6169]: Best EMA pseudo Dice: 0.7672
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6628310853142407

@plbenveniste
Copy link
Collaborator Author

plbenveniste commented Oct 24, 2023

Performance of 2d nnUNet model trained on PSIR and STIR images

Model: Dataset222_RegionBasedLesionSeg/nnUNetTrainer__nnUNetPlans__2d
The results are the following:

  • fold 0:
    - Pseudo dice [0.9305, 0.6015] : Best EMA pseudo Dice: 0.7579
    This split has 268 training and 68 validation cases.
    Mean Validation Dice: 0.6658469693278798
  • fold 1:
    - Pseudo dice [0.9291, 0.5816] : Best EMA pseudo Dice: 0.7426
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6476568597544837
  • fold 2:
    - Pseudo dice [0.9273, 0.5853] : Best EMA pseudo Dice: 0.7509
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6632300056996099
  • fold 3:
    - Pseudo dice [0.9272, 0.6061] : Best EMA pseudo Dice: 0.7536
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6413213015908468
  • fold 2:
    - Pseudo dice [0.9238, 0.6295] : Best EMA pseudo Dice: 0.7684
    This split has 269 training and 67 validation cases.
    Mean Validation Dice: 0.6811857036644431

@plbenveniste
Copy link
Collaborator Author

Model choice

In the file nnUNet_inference_analysis.ipynb we performed an extensive analysis of the model's performance over the test set (89 images : 20% of each site. We used Anima to compute:

  • PPVL (Positive Predictive Value for Lesions)
  • SensL, Lesion detection sensitivity
  • F1 Score, a F1 Score between PPVL and SensL

The two best model's fold were :

  • the 3d nnUNet trained on STIR and multiplied by -1 PSIR
  • the 2d nnUNet trained on STIR and multiplied by -1 PSIR

The performance comparison can be seen in the following plots:


The following table displays the performance value:
Screen Shot 2023-10-25 at 10 43 13 AM

Also, it seems that after performing visual comparison of 10 inferences, the 2d nnUNet seems to perform better then the 3d nnUNet.

@valosekj @jcohenadad Any feedback ?

@jcohenadad
Copy link
Member

Indeed, the 2D seems to give better performances on paper, but I find that loosing the 3D information is problematic for segmentation tasks involving very small objects. I would still go with the 3D I think.

@plbenveniste
Copy link
Collaborator Author

Thanks for the feedback ! Even though, from what I have seen on inferences, the 3d aspect of the lesions cannot really be seen, I also think that it makes more sense using a 3d model since it segments the spinal cord as well as lesions.

Predictions of M12 time-point were done with both 2d and 3d model and converted to BIDS format. They are available here :

  • duke.neuro.polymtl.ca/temp/plben/inference_results_555_2d_BIDS
  • duke.neuro.polymtl.ca/temp/plben/inference_results_555_3d_BIDS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants