Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding Reproducing mIoU Results on VOC2012 Dataset #4

Open
master-Shix opened this issue Aug 26, 2024 · 7 comments
Open

Comments

@master-Shix
Copy link

First and foremost, I would like to express my sincere gratitude for your outstanding work. It has been incredibly insightful and inspiring. However, I am currently facing some challenges in reproducing the segmentation results on the VOC2012 dataset as described in your paper.

In particular, I noticed that the highest mIoU result of 87.11 was achieved using the BLIP40 caption. However, my experiments have only yielded an mIoU of 84.66. I would greatly appreciate it if you could provide some additional details regarding the hardware setup you used. Specifically, could you let me know which GPU devices were utilized and whether multi-GPU training was involved?

Furthermore, I would like to confirm if the command used to achieve the 87.11 mIoU was the one provided in the cvpr_experiments section of your repository:

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 2 --val_batch_size 2 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --debug

Thank you very much for your time and assistance.
I look forward to your guidance.

@master-Shix
Copy link
Author

Thank you for your quickly reply, but could I ask what is this one,like a exe file?

@nkondapa
Copy link
Collaborator

nkondapa commented Aug 26, 2024

Hi, looks like there's a mistake in the provided command. The --debug flag is still set, which would overwrite the batch size to 1. If you delete that, than the effective batch size would go up to 8. I will update the file in cvpr_experiments. Let me know if this works for you.

@master-Shix
Copy link
Author

Thank you for your quick reply! Do you meaning that I need to use
python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 _--batch_size 2 --val_batch_size 2**_ --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

or

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

There doesn't seem to be any other place code to update batch_size from 2 to 8.

@master-Shix
Copy link
Author

I'm curious about the potential impact on results when using 4 A6000 GPUs with a batch size of 8. Would you mind sharing your thoughts on this setup?
For context, here's the command I used:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

I'd appreciate any insights you might have on how this configuration could influence the outcome. Thank you for your time and expertise.

@nkondapa
Copy link
Collaborator

Thank you for your quick reply! Do you meaning that I need to use python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 __--batch_size 2 --val_batch_size 2** --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

or

python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json

There doesn't seem to be any other place code to update batch_size from 2 to 8.

Just removing the debug flag would result in an effective batch size of 8 (on a single gpu). You can see that the flag --batch_size 2 and --accum_grad_batches 4 -- 2*4 = 8.

@nkondapa
Copy link
Collaborator

I'm curious about the potential impact on results when using 4 A6000 GPUs with a batch size of 8. Would you mind sharing your thoughts on this setup? For context, here's the command I used:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 4 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

I'd appreciate any insights you might have on how this configuration could influence the outcome. Thank you for your time and expertise.

You should probably set accum_grad_batches to 1 so it is not used. The setup looks fine, I would expect the larger batch size to improve the results, but of course I don't know for sure. You may want to scale the lr with the batch size. Also, I think we last tested this code on just 1 GPU, so it may have some problems for the multi-gpu setting. It should be possible to make it work though.

@master-Shix
Copy link
Author

Thank you very much for your prompt and helpful reply. I truly appreciate your guidance and support throughout this process.
I'm pleased to report some progress following your suggestions. After modifying the --debug parameter, I achieved an mIoU of 86.7. Further adjustments, including setting the batch size to 8 and using accum_grad=1, resulted in a slight improvement to 86.8 mIoU.
While these results are encouraging, I'm still striving to reach the 87.11 mIoU reported in the paper. I've also experimented with multi-GPU training using the following configuration:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_tadp.py --wandb_group pascal_segmentation_clean --exp_name blip_min40 --model TADPSeg --max_epochs 15 --batch_size 8 --val_batch_size 8 --accum_grad_batches 1 --log_every_n_steps 100 --log_freq 100 --text_conditioning blip --use_scaled_encode --blip_caption_path captions/pascal_captions_min=40_max=77.json --num_gpus 4

With this setup and doubling the learning rate, the highest mIoU I've achieved is 86.3.
I'm curious about your thoughts on these results. Do you consider this performance within the expected range? I've noticed that the multi-GPU setup hasn't led to improved results as I had anticipated. Are there any potential issues or optimizations you might suggest?

Additionally, I wonder if you have any checkpoint files from your VOC2012 segmentation dataset training that you could share? This could be immensely helpful for benchmarking and troubleshooting.

Once again, thank you for your time and expertise. Your insights are invaluable, and I'm looking forward to your perspective on these findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@nkondapa @master-Shix and others