fix metric calculation with multiple GPUs for semantic segmentation #175

Shanci-Li · 2024-11-26T11:12:36Z

What does this PR do?

This PR gathers the confusion matrix computed on the unique mini-batch assigned to each GPU when training with DDP and then calculates the metrics based on the updated confusion matrix.

Fixes #165

Before submitting

Did you make sure title is self-explanatory and the description concisely explains the PR?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you list all the breaking changes introduced by this pull request?
Did you test your PR locally with pytest command?
Did you run pre-commit hooks with pre-commit run -a command?

Did you have fun?

Make sure you had fun coding 🙃

src/models/semantic.py

drprojects

Thanks for this PR ! I went through your code and I have concerns regarding the order in which you gather the metrics from each process and the calls to .reset(). I have not tested your modifications on a machine but I suspect some of these changes break the single-device behavior. Have you checked the behavior and metrics are still the same in the single-device scenario ? Please see my comments in the code for more details.

Also, if you feel capable, would you mind having a shot at the distributed panoptic segmentation metrics too (ie src/models/semantic.py) ? I did not look much into it but presume you should be able to accumulate the internal states from PanopticQuality3D and MeanAveragePrecision3D objects from multiple devices by leveraging the .update() method of each of these classes (see their respective implems to understand what is happening under the hood).

src/models/semantic.py

Shanci-Li · 2024-11-26T13:02:27Z

Yes, you are right! I did not pay attention to the influence of my code on single-device training. My project requires vectorization and modeling after the semantic segmentation and I moved to this task once I got an acceptable model. I have pushed a new commit that fixes these concerns.

For the inspection of panoptic segmentation, I would like to help. But I am not sure when I have time to work on that ... I will try to do it asap.

drprojects · 2024-11-26T13:08:25Z

Super, thanks for your prompt modifications ! Be sure I really appreciate the help !

Have you tested on your machine for the single-gpu scenario ? Did you check whether anything breaks and whether the metrics are coherent before and after the modifications ?

Regarding the panoptic segmentation metrics, may I leave this PR open for now to give you a bit of time to have a shot at it ?

Shanci-Li · 2024-11-27T10:01:09Z

I tested the training in DDP and single-gpu scenario last night and the training works well. Now the metrics are logged coherently in both cases. With the default configuration file on KITTI-360 dataset (200 epoch), the single-gpu achieves 54.43 best val miou while DDP stops at 52.11. I suppose this is casued by the difference of batch size and learning rate. But I am sure that the error of inconsistence between val_miou and best_val_miou no longer exsists.

For the inspection of panoptic segmentation, I suggest to close this PR if you do not mind. Once I finish the modification I will launch a new one. As so far I have no guarantee I could finish that in a short time.

drprojects · 2024-11-28T14:16:00Z

Hi @Shanci-Li, OK please give me a few days to also test your code on my machine before I can merge your PR. Thanks for your contribution 😉

fix metric calculation with multiple GPUs

b245b1d

drprojects reviewed Nov 26, 2024

View reviewed changes

src/models/semantic.py Outdated Show resolved Hide resolved

reset the cm at the end of metric calculation

6086e85

drprojects reviewed Nov 26, 2024

View reviewed changes

fix potential issue on single device training

c0117de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix metric calculation with multiple GPUs for semantic segmentation #175

fix metric calculation with multiple GPUs for semantic segmentation #175

Shanci-Li commented Nov 26, 2024 •

edited

Loading

drprojects left a comment

Shanci-Li commented Nov 26, 2024

drprojects commented Nov 26, 2024

Shanci-Li commented Nov 27, 2024

drprojects commented Nov 28, 2024

fix metric calculation with multiple GPUs for semantic segmentation #175

Are you sure you want to change the base?

fix metric calculation with multiple GPUs for semantic segmentation #175

Conversation

Shanci-Li commented Nov 26, 2024 • edited Loading

What does this PR do?

Before submitting

Did you have fun?

drprojects left a comment

Choose a reason for hiding this comment

Shanci-Li commented Nov 26, 2024

drprojects commented Nov 26, 2024

Shanci-Li commented Nov 27, 2024

drprojects commented Nov 28, 2024

Shanci-Li commented Nov 26, 2024 •

edited

Loading