Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match GTs and predictions based on participant_id, acq_id, and run_id #17

Merged
merged 11 commits into from
Dec 11, 2024

Conversation

valosekj
Copy link
Member

@valosekj valosekj commented Dec 6, 2024

So far, we have been matching GTs and predictions based on simple list sorting, which is prone to potential errors. Additionally, this approach worked only when the number of GTs and predictions was the same (which is not ideal; for example, sometimes, we got no prediction when a model failed to segment a lesion).

I thus improved the matching by using BIDS compatible keyes (participant_id, ses_id, acq_id, chunk_id, and run_id) and pandas dataframes. This approach is more robust, and it also works for different numbers of GTs and predictions (after merging dataframe for GTs and predictions, I simply drop NaNs).

@valosekj
Copy link
Member Author

valosekj commented Dec 9, 2024

Hmm, after resetting main to 76dbb55 (context: #18 (comment)), I need to remove 54ead73 and 2bad1f2 commits from this branch.

UDPATE: Resolved in-person with @mguaypaq using git rebase -i main and dropping the two commits.

@valosekj valosekj force-pushed the jv/match_gt_and_pred_based_on_participant_id branch from 3b9d62e to ce16ec8 Compare December 9, 2024 19:21
@valosekj
Copy link
Member Author

Just for the record, I removed the failing CI and unit tests from the original MetricsReloded repo in 21c8619

@naga-karthik
Copy link
Member

naga-karthik commented Dec 10, 2024

Additionally, this approach worked only when the number of GTs and predictions was the same (which is not ideal; for example, sometimes, we got no prediction when a model failed to segment a lesion).

yeah, true! we almost always tried region-based segmentations and the models always segments SC so we/I never a failed case with this. But I agree, this should be more robust!

I thus improved the matching by using participant_id (and session_id and run_id)

quick question, does it also work when the filename does not have the acq suffix? In other words, does pd.merge also handle the else case? (i.e. else "" when acq and run ids are not present)

@naga-karthik
Copy link
Member

might be unrelated but because we're on the topic, i am documenting it

I think we should compute the metrics individually per each class. That is, say, if we have region-based seg/lesion label, then the current script automatically computes metrics for both of them sequentially in a for loop. But, for the bavaria-quebec project, I have had issues where even if the prediciton has only 1 class, I see two classes in the output csv file. When metrics are aggregated across all subjects then this results in incorrect scores. All I'm tyring to say here is that the for loop and iterating across unique labels is not robust. In the end, I had to separate SC and lesion labels and then compute the metrics for SC and lesions independently (to be sure)

how about we:

  1. create temporary masks for each available class in the predictino mask
  2. run the metrics on these temporary (single-class) masks
  3. delete the temporary masks?

@valosekj
Copy link
Member Author

quick question, does it also work when the filename does not have the acq suffix? In other words, does pd.merge also handle the else case? (i.e. else "" when acq and run ids are not present)

Good question! Yes, pd.merge works if an entity (e.g., acq) is not included in the filenames. Please see recently pushed unit tests for details: 0c61d12

@valosekj
Copy link
Member Author

@naga-karthik, I open a separate issue (#19) for your comment above as it is indeed not directly related to this PR

@valosekj
Copy link
Member Author

Otherwise, I think that the PR is now ready to be merged. @naga-karthik, do you mind testing on some of your data (maybe bavaria-quebec)? Just to make sure everything works. Thanks!

@naga-karthik
Copy link
Member

will check tomorrow!

@naga-karthik
Copy link
Member

hey @valosekj, I tried running it on bavaria-quebec dataset and I ran into this error because of a GT-pred pairing mismatch:

Processing:
        Prediction: tumMSChunksNeuropolyAxialRegion_sub-m492109_ses-20140516_preds_stack.nii.gz
        Reference: tumMSChunksNeuropolyAxialRegion_sub-m492109_ses-20191211_refs_stack.nii.gz
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/GRAMES.POLYMTL.CA/u114716/.conda/envs/venv_nnunetv2/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/GRAMES.POLYMTL.CA/u114716/.conda/envs/venv_nnunetv2/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/GRAMES.POLYMTL.CA/u114716/tum-poly/MetricsReloaded/compute_metrics_reloaded.py", line 266, in process_subject
    return compute_metrics_single_subject(prediction_file, reference_file, metrics)
  File "/home/GRAMES.POLYMTL.CA/u114716/tum-poly/MetricsReloaded/compute_metrics_reloaded.py", line 183, in compute_metrics_single_subject
    raise ValueError(f'The prediction and reference (ground truth) images must have the same shape. '
ValueError: The prediction and reference (ground truth) images must have the same shape. The prediction image has shape (352, 352, 37, 3) and the ground truth image has shape (384, 384, 35, 3).
"""

@naga-karthik
Copy link
Member

I added session-based pairing in commit d45b330 but still got an error because of a mis-match in chunks numbers

Processing:
        Prediction: tumMSChunksNeuropolyAxialRegion_sub-m293717_ses-20130227_acq-ax_chunk-3_T2w_314_0000_lesionseg.nii.gz
        Reference: tumMSChunksPolyNYUAxialRegion_sub-m293717_ses-20130227_acq-ax_chunk-1_T2w_315.nii.gz

@naga-karthik
Copy link
Member

I updated the fetch function by also getting the chunk-id (exists for bavaria dataset) in commit 17f1c39. With this the script runs as expected:

....
Processing:
        Prediction: tumMSChunksNeuropolyAxialRegion_sub-m617186_ses-20140314_acq-ax_chunk-1_T2w_355_0000_lesionseg.nii.gz
        Reference: tumMSChunksPolyNYUAxialRegion_sub-m617186_ses-20140314_acq-ax_chunk-1_T2w_355.nii.gz

Processing:
        Prediction: tumMSChunksNeuropolyAxialRegion_sub-m055157_ses-20191104_acq-ax_chunk-2_T2w_050_0000_lesionseg.nii.gz
        Reference: tumMSChunksPolyNYUAxialRegion_sub-m055157_ses-20191104_acq-ax_chunk-2_T2w_050.nii.gz
Saved metrics to /home/GRAMES.POLYMTL.CA/u114716/tum-poly/metrics_final_temp.csv.
Saved mean and standard deviation of metrics across all subjects to /home/GRAMES.POLYMTL.CA/u114716/tum-poly/metrics_final_temp_mean.csv.

If my changes look fine, then please merge @valosekj !

@valosekj
Copy link
Member Author

Thanks for the review and improving the code, Naga! I added some extra tests to test ses_id and chunk_id. I believe we can merge.

@valosekj valosekj merged commit b3f354e into main Dec 11, 2024
1 check passed
@valosekj valosekj deleted the jv/match_gt_and_pred_based_on_participant_id branch December 11, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants