Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAMF FDG-Avid Breast Tumor #87

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

jithenece
Copy link

AIMI1 - Pretrained model for 3D semantic image segmentation of the FDG-avid lesions from PT/CT scans

@jithenece
Copy link
Author

sample:
  idc_version: Version 2: Updated 2020/01/10
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.200887946066796652452097013479
    aws_url: s3://idc-open-data/7a375cfa-8708-46f0-b056-248dbaca851e/*
    path: case1/ct
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.182230913012492714502249429083
    aws_url: s3://idc-open-data/739449f3-286c-4511-86a9-39172a50dc95/*
    path: case1/pt

reference:
  url: https://drive.google.com/file/d/1afkRFJqwgii1tUSMiQV_dPlDsUBl6_0g/view?usp=sharing

@jithenece
Copy link
Author

Archive.zip
Slicer screenshots added

@jithenece jithenece marked this pull request as ready for review July 2, 2024 13:44
Copy link
Member

@LennyN95 LennyN95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this implementation, this appears to be a more elaborate pipeline. Well done! I added some questions and discussion points in my review.

models/bamf_pet_ct_breast_tumor/utils/Registration.py Outdated Show resolved Hide resolved

execute:
- FileStructureImporter
- SitkNiftiConverter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this converter? Cannot we use the NiftiConverter module? There you can configure the backend to use dcm2niix which offers high performance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have changed engine to dcm2niix. it required upgrading dcm2niix==1.0.20220715 to fix the conversion issue on few ct scans.

Comment on lines 44 to 128
def export_prob_mask(self, nnunet_out_dir: str, ref_file: InstanceData, output_dtype: str = 'float32', structure_list: Optional[List[str]] = None):
"""
Convert softmax probability maps to NRRD. For simplicity, the probability maps
are converted by default to UInt8
Arguments:
model_output_folder : required - path to the folder where the inferred segmentation masks should be stored.
ref_file : required - InstanceData object of the generated segmentation mask used as reference file.
output_dtype : optional - output data type. Data type float16 is not supported by the NRRD standard,
so the choice should be between uint8, uint16 or float32.
structure_list : optional - list of the structures whose probability maps are stored in the
first channel of the `.npz` file (output from the nnU-Net pipeline
when `export_prob_maps` is set to True).
Outputs:
This function [...]
"""

# initialize structure list
if structure_list is None:
if self.roi is not None:
structure_list = self.roi.split(',')
else:
structure_list = []

# sanity check user inputs
assert(output_dtype in ["uint8", "uint16", "float32"])

# input file containing the raw information
pred_softmax_fn = 'VOLUME_001.npz'
pred_softmax_path = os.path.join(nnunet_out_dir, pred_softmax_fn)

# parse NRRD file - we will make use of if to populate the header of the
# NRRD mask we are going to get from the inferred segmentation mask
sitk_ct = sitk.ReadImage(ref_file.abspath)

# generate bundle for prob masks
# TODO: we really have to create folders (or add this as an option that defaults to true) automatically
prob_masks_bundle = ref_file.getDataBundle('prob_masks')
if not os.path.isdir(prob_masks_bundle.abspath):
os.mkdir(prob_masks_bundle.abspath)

# load softmax probability maps
pred_softmax_all = np.load(pred_softmax_path)["softmax"]

# iterate all channels
for channel in range(0, len(pred_softmax_all)):

structure = structure_list[channel] if channel < len(structure_list) else f"structure_{channel}"
pred_softmax_segmask = pred_softmax_all[channel].astype(dtype = np.float32)

if output_dtype == "float32":
# no rescale needed - the values will be between 0 and 1
# set SITK image dtype to Float32
sitk_dtype = sitk.sitkFloat32

elif output_dtype == "uint8":
# rescale between 0 and 255, quantize
pred_softmax_segmask = (255*pred_softmax_segmask).astype(np.int32)
# set SITK image dtype to UInt8
sitk_dtype = sitk.sitkUInt8

elif output_dtype == "uint16":
# rescale between 0 and 65536
pred_softmax_segmask = (65536*pred_softmax_segmask).astype(np.int32)
# set SITK image dtype to UInt16
sitk_dtype = sitk.sitkUInt16
else:
raise ValueError("Invalid output data type. Please choose between uint8, uint16 or float32.")

pred_softmax_segmask_sitk = sitk.GetImageFromArray(pred_softmax_segmask)
pred_softmax_segmask_sitk.CopyInformation(sitk_ct)
pred_softmax_segmask_sitk = sitk.Cast(pred_softmax_segmask_sitk, sitk_dtype)

# generate data
prob_mask = InstanceData(f'{structure}.nrrd', DataType(FileType.NRRD, {'mod': 'prob_mask', 'structure': structure}), bundle=prob_masks_bundle)

# export file
writer = sitk.ImageFileWriter()
writer.UseCompressionOn()
writer.SetFileName(prob_mask.abspath)
writer.Execute(pred_softmax_segmask_sitk)

# check if the file was written
if os.path.isfile(prob_mask.abspath):
self.v(f" > prob mask for {structure} saved to {prob_mask.abspath}")
prob_mask.confirm()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the probability map export under the modified circumstances?

- SitkNiftiConverter
- Registration
- NNUnetPETCTRunner
- TotalSegmentatorMLRunner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is TotalSegmentator run as part of this model?

Comment on lines 34 to 36
TotalSegmentatorMLRunner:
in_data: nifti:mod=ct:registered=true
use_fast_mode: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume that fast mode would work just as well but should significantly speed up the process.

Comment on lines 24 to 26
SitkNiftiConverter:
in_datas: dicom:mod=pt|ct
allow_multi_input: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines 10 to 12
# Install nnunet
# Install TotalSegmentator
RUN pip3 install TotalSegmentator==1.5.7 nnunet==1.6.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above (why do we need TS for this model).

Comment on lines +15 to +124
'pancreas': 'PANCREAS',
'adrenal_gland_right': 'RIGHT_ADRENAL_GLAND',
'adrenal_gland_left': 'LEFT_ADRENAL_GLAND',
'lung_upper_lobe_left': 'LEFT_UPPER_LUNG_LOBE',
'lung_lower_lobe_left': 'LEFT_LOWER_LUNG_LOBE',
'lung_upper_lobe_right': 'RIGHT_UPPER_LUNG_LOBE',
'lung_middle_lobe_right': 'RIGHT_MIDDLE_LUNG_LOBE',
'lung_lower_lobe_right': 'RIGHT_LOWER_LUNG_LOBE',
'vertebrae_L5': 'VERTEBRAE_L5',
'vertebrae_L4': 'VERTEBRAE_L4',
'vertebrae_L3': 'VERTEBRAE_L3',
'vertebrae_L2': 'VERTEBRAE_L2',
'vertebrae_L1': 'VERTEBRAE_L1',
'vertebrae_T12': 'VERTEBRAE_T12',
'vertebrae_T11': 'VERTEBRAE_T11',
'vertebrae_T10': 'VERTEBRAE_T10',
'vertebrae_T9': 'VERTEBRAE_T9',
'vertebrae_T8': 'VERTEBRAE_T8',
'vertebrae_T7': 'VERTEBRAE_T7',
'vertebrae_T6': 'VERTEBRAE_T6',
'vertebrae_T5': 'VERTEBRAE_T5',
'vertebrae_T4': 'VERTEBRAE_T4',
'vertebrae_T3': 'VERTEBRAE_T3',
'vertebrae_T2': 'VERTEBRAE_T2',
'vertebrae_T1': 'VERTEBRAE_T1',
'vertebrae_C7': 'VERTEBRAE_C7',
'vertebrae_C6': 'VERTEBRAE_C6',
'vertebrae_C5': 'VERTEBRAE_C5',
'vertebrae_C4': 'VERTEBRAE_C4',
'vertebrae_C3': 'VERTEBRAE_C3',
'vertebrae_C2': 'VERTEBRAE_C2',
'vertebrae_C1': 'VERTEBRAE_C1',
'esophagus': 'ESOPHAGUS',
'trachea': 'TRACHEA',
'heart_myocardium': 'MYOCARDIUM',
'heart_atrium_left': 'LEFT_ATRIUM',
'heart_ventricle_left': 'LEFT_VENTRICLE',
'heart_atrium_right': 'RIGHT_ATRIUM',
'heart_ventricle_right': 'RIGHT_VENTRICLE',
'pulmonary_artery': 'PULMONARY_ARTERY',
'brain': 'BRAIN',
'iliac_artery_left': 'LEFT_ILIAC_ARTERY',
'iliac_artery_right': 'RIGHT_ILIAC_ARTERY',
'iliac_vena_left': 'LEFT_ILIAC_VEIN',
'iliac_vena_right': 'RIGHT_ILIAC_VEIN',
'small_bowel': 'SMALL_INTESTINE',
'duodenum': 'DUODENUM',
'colon': 'COLON',
'rib_left_1': 'LEFT_RIB_1',
'rib_left_2': 'LEFT_RIB_2',
'rib_left_3': 'LEFT_RIB_3',
'rib_left_4': 'LEFT_RIB_4',
'rib_left_5': 'LEFT_RIB_5',
'rib_left_6': 'LEFT_RIB_6',
'rib_left_7': 'LEFT_RIB_7',
'rib_left_8': 'LEFT_RIB_8',
'rib_left_9': 'LEFT_RIB_9',
'rib_left_10': 'LEFT_RIB_10',
'rib_left_11': 'LEFT_RIB_11',
'rib_left_12': 'LEFT_RIB_12',
'rib_right_1': 'RIGHT_RIB_1',
'rib_right_2': 'RIGHT_RIB_2',
'rib_right_3': 'RIGHT_RIB_3',
'rib_right_4': 'RIGHT_RIB_4',
'rib_right_5': 'RIGHT_RIB_5',
'rib_right_6': 'RIGHT_RIB_6',
'rib_right_7': 'RIGHT_RIB_7',
'rib_right_8': 'RIGHT_RIB_8',
'rib_right_9': 'RIGHT_RIB_9',
'rib_right_10': 'RIGHT_RIB_10',
'rib_right_11': 'RIGHT_RIB_11',
'rib_right_12': 'RIGHT_RIB_12',
'humerus_left': 'LEFT_HUMERUS',
'humerus_right': 'RIGHT_HUMERUS',
'scapula_left': 'LEFT_SCAPULA',
'scapula_right': 'RIGHT_SCAPULA',
'clavicula_left': 'LEFT_CLAVICLE',
'clavicula_right': 'RIGHT_CLAVICLE',
'femur_left': 'LEFT_FEMUR',
'femur_right': 'RIGHT_FEMUR',
'hip_left': 'LEFT_HIP',
'hip_right': 'RIGHT_HIP',
'sacrum': 'SACRUM',
'face': 'FACE',
'gluteus_maximus_left': 'LEFT_GLUTEUS_MAXIMUS',
'gluteus_maximus_right': 'RIGHT_GLUTEUS_MAXIMUS',
'gluteus_medius_left': 'LEFT_GLUTEUS_MEDIUS',
'gluteus_medius_right': 'RIGHT_GLUTEUS_MEDIUS',
'gluteus_minimus_left': 'LEFT_GLUTEUS_MINIMUS',
'gluteus_minimus_right': 'RIGHT_GLUTEUS_MINIMUS',
'autochthon_left': 'LEFT_AUTOCHTHONOUS_BACK_MUSCLE',
'autochthon_right': 'RIGHT_AUTOCHTHONOUS_BACK_MUSCLE',
'iliopsoas_left': 'LEFT_ILIOPSOAS',
'iliopsoas_right': 'RIGHT_ILIOPSOAS',
'urinary_bladder': 'URINARY_BLADDER'
}

# from totalsegmentator.map_to_binary import class_map
# ROI = ','.join(mapping[class_map['total'][ci]] for ci in range(1, 105))
ROI = 'SPLEEN,RIGHT_KIDNEY,LEFT_KIDNEY,GALLBLADDER,LIVER,STOMACH,AORTA,INFERIOR_VENA_CAVA,PORTAL_AND_SPLENIC_VEIN,PANCREAS,RIGHT_ADRENAL_GLAND,LEFT_ADRENAL_GLAND,LEFT_UPPER_LUNG_LOBE,LEFT_LOWER_LUNG_LOBE,RIGHT_UPPER_LUNG_LOBE,RIGHT_MIDDLE_LUNG_LOBE,RIGHT_LOWER_LUNG_LOBE,VERTEBRAE_L5,VERTEBRAE_L4,VERTEBRAE_L3,VERTEBRAE_L2,VERTEBRAE_L1,VERTEBRAE_T12,VERTEBRAE_T11,VERTEBRAE_T10,VERTEBRAE_T9,VERTEBRAE_T8,VERTEBRAE_T7,VERTEBRAE_T6,VERTEBRAE_T5,VERTEBRAE_T4,VERTEBRAE_T3,VERTEBRAE_T2,VERTEBRAE_T1,VERTEBRAE_C7,VERTEBRAE_C6,VERTEBRAE_C5,VERTEBRAE_C4,VERTEBRAE_C3,VERTEBRAE_C2,VERTEBRAE_C1,ESOPHAGUS,TRACHEA,MYOCARDIUM,LEFT_ATRIUM,LEFT_VENTRICLE,RIGHT_ATRIUM,RIGHT_VENTRICLE,PULMONARY_ARTERY,BRAIN,LEFT_ILIAC_ARTERY,RIGHT_ILIAC_ARTERY,LEFT_ILIAC_VEIN,RIGHT_ILIAC_VEIN,SMALL_INTESTINE,DUODENUM,COLON,LEFT_RIB_1,LEFT_RIB_2,LEFT_RIB_3,LEFT_RIB_4,LEFT_RIB_5,LEFT_RIB_6,LEFT_RIB_7,LEFT_RIB_8,LEFT_RIB_9,LEFT_RIB_10,LEFT_RIB_11,LEFT_RIB_12,RIGHT_RIB_1,RIGHT_RIB_2,RIGHT_RIB_3,RIGHT_RIB_4,RIGHT_RIB_5,RIGHT_RIB_6,RIGHT_RIB_7,RIGHT_RIB_8,RIGHT_RIB_9,RIGHT_RIB_10,RIGHT_RIB_11,RIGHT_RIB_12,LEFT_HUMERUS,RIGHT_HUMERUS,LEFT_SCAPULA,RIGHT_SCAPULA,LEFT_CLAVICLE,RIGHT_CLAVICLE,LEFT_FEMUR,RIGHT_FEMUR,LEFT_HIP,RIGHT_HIP,SACRUM,FACE,LEFT_GLUTEUS_MAXIMUS,RIGHT_GLUTEUS_MAXIMUS,LEFT_GLUTEUS_MEDIUS,RIGHT_GLUTEUS_MEDIUS,LEFT_GLUTEUS_MINIMUS,RIGHT_GLUTEUS_MINIMUS,LEFT_AUTOCHTHONOUS_BACK_MUSCLE,RIGHT_AUTOCHTHONOUS_BACK_MUSCLE,LEFT_ILIOPSOAS,RIGHT_ILIOPSOAS,URINARY_BLADDER'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed, right?

Comment on lines +74 to +98
tumor_seg_path = in_tumor_data.abspath
total_seg_path = in_total_seg_data.abspath

ts_data = sitk.GetArrayFromImage(sitk.ReadImage(total_seg_path))
ts_abdominal = sitk.GetArrayFromImage(sitk.ReadImage(total_seg_path))
ts_data[ts_data > 1] = 1
lesions = sitk.GetArrayFromImage(sitk.ReadImage(tumor_seg_path))
tumor_label = 9
lesions[lesions != tumor_label] = 0
lesions[lesions == tumor_label] = 1

op_data = np.zeros(ts_data.shape)
ref = sitk.ReadImage(in_ct_data.abspath)
ct_data = sitk.GetArrayFromImage(ref)

op_data[lesions == 1] = 1
th = np.min(ct_data)
op_data[ct_data == th] = 0 # removing predicitons where CT not available
# Use the coordinates of the bounding box to crop the 3D numpy array.
ts_abdominal[ts_abdominal > 4] = 0
ts_abdominal[ts_abdominal > 1] = 1
if ts_abdominal.max() > 0:
x1, x2, y1, y2, z1, z2 = self.bbox2_3D(ts_abdominal)
# Create a structuring element with ones in the middle and zeros around it
structuring_element = np.ones((3, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't running TS a bit overkill for the task? What about taking abdominal segmentation masks as an input to the pipeline? Then a user could run any model (e.g. TotalSegmentator from MHub) and provide the dicomseg file alongside the input dicom files.

I#m asking this to evaluate the situation and to get the best possible user experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest user experience would be to have dicom in -> dicom out. and not have a segmentation also be an input.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LennyN95 please let us know if you are fine with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to provide an alternative workflow (config file) then, where you start with a DicomImporter expecting TotalSegmentator DICOMSEG files (e.g. as generated when running the MHub TotalSegmentator model) and use a DsegExtractor in the execute chain to extract the segmentations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jithenece an alternative workflow sounds great, then we have both options

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LennyN95 I have added multi flow which takes total segmentator files as input. please check and let me know for any changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the alternative workflow (config file) to composite. Please check

@jithenece
Copy link
Author

/test

sample:
  idc_version: "Data Release 2.0 January 10, 2020"
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.200887946066796652452097013479
    aws_url: s3://idc-open-data/7a375cfa-8708-46f0-b056-248dbaca851e/*
    path: 'case_study1/ct'
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.182230913012492714502249429083
    aws_url: s3://idc-open-data/739449f3-286c-4511-86a9-39172a50dc95/*
    path: 'case_study1/pt'

reference:
  url: https://drive.google.com/file/d/1yivXqTBMXslsmj3uD8gE9LojLCKh-BqV/view?usp=sharing

@LennyN95
Copy link
Member

LennyN95 commented Aug 26, 2024

@jithenece I wanted to run our test routine on all read for testing PR's, however, I noticed, the sample reference url you provided won't work with wget -O /tmp/ref.zip $RESSOURCE_URL. This might be specific to using a google drive link, however, I generally advice you to instead to attach your data to the PR directly (e.g., by uploading them to the comment) and to use the permanent link as a reference.

Please note, this needs to be updated on all effected PRs until testing can proceed.

@jithenece
Copy link
Author

jithenece commented Sep 9, 2024

/test

Not many collection available for breast-pet-ct.

attaching segmentation
output.zip

sample: 
  idc_version: 17.0
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.539267076861125410814830191835
    aws_url: s3://idc-open-data/84c9b972-76e1-4fa3-a7ea-19b6500e497a/*
    path: case_study1/ct
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.8162.7003.196821690630879561473146713439
    aws_url: s3://idc-open-data/33f0bd14-4bf5-469c-83db-183e1ab96f02/*
    path: case_study1/pt

reference:
  url: https://github.com/user-attachments/files/16927110/output.zip

@github-actions github-actions bot added INVALID TEST REQUEST The contributor requested a test but the test block is not valid. TEST REQUESTED and removed TEST REQUESTED INVALID TEST REQUEST The contributor requested a test but the test block is not valid. labels Sep 9, 2024
@LennyN95
Copy link
Member

Please note, we updated our base image. All mhub dependencies are now installed in a virtual environment under /app/.venv running Python 3.11. Python, virtual environment and dependencies are now managed with uv. If required, you can create custom virtual environments, e.g., uv venv -p 3.8 .venv38 and use uv pip install -p .venv38 packge-name to install dependencies and uv run -p .venv3.8 python script.py to run a python script.

We also simplified our test routine. Sample and reference data now have to be uploaded to Zenodo and provided in a mhub.tom file at the project root. The process how to create and provide these sample data is explained in the updated testing phase article of our documentation. Under doi.org/10.5281/zenodo.13785615 we provide sample data as a reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants