Hyperparameter settings - batch size settings are difficult to follow #7

zyfone · 2024-04-09T12:03:42Z

Thank you for your valuable contributions to the DAOD community, particularly in the fair benchmark construction and comparison. I have some suggestions regarding the hyperparameter batch size.

The batch size in the paper is notable (48 samples) compared to existing methods[#1,#2,#3], whose batch size is 1 or 2 for both the source and target domains. This benchmark places greater demands on the GPUs and may prove challenging to follow. In contrast to previous work, the small batch size allows more researchers to join the DAOD community.

Reference:
#1 SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation
#2 Strong-Weak Distribution Alignment for Adaptive Object Detection
#3 Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain Adaptation in Object Detection

justinkay · 2024-04-09T15:22:12Z

Hi @zyfone thank you for your question and interest, we are glad you find it valuable!

Our codebase should be able to accommodate such GPU constraints without any changes:

For the batch size parameter, we interpret this slightly differently than a typical Detectron2 implementation in order to try and address concerns about different batch sizes between codebases. We use two batch size config values, SOLVER.IMS_PER_BATCH (the usual Detectron2 param) and SOLVER.IMS_PER_GPU. In ALDI, we interpret SOLVER.IMS_PER_BATCH as the total batch size across all GPUs across all datasets including both source and target. We use gradient accumulation to get there no matter how small SOLVER.IMS_PER_GPU and no matter how many/few GPUs you are using.

So it would be entirely possible to use a batch size of 2 on just one GPU and run our code out of the box; what would happen is that gradient accumulation would take place such that 24 forward/backward passes would take place before optimizer.step() is called. In this way our results can be reproduced regardless of GPU hardware.

For what it's worth, a batch size of 48 matches other related work such as AT: https://github.com/facebookresearch/adaptive_teacher , but this is not immediately clear from their config values. If you look at their training code they are actually using gradient accumulation to run 3 forward passes (one for source, one for target, one for feature alignment) before optimizer.step(), so while their config indicates the batch size is 16 it is actually 16*3=48.

Thanks for your interest and please feel free to follow up if this is not clear!

zyfone · 2024-04-09T22:29:56Z

Thanks for you quick reply. (๑¯ω¯๑)

zyfone closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter settings - batch size settings are difficult to follow #7

Hyperparameter settings - batch size settings are difficult to follow #7

zyfone commented Apr 9, 2024

justinkay commented Apr 9, 2024 •

edited

Loading

zyfone commented Apr 9, 2024

Hyperparameter settings - batch size settings are difficult to follow #7

Hyperparameter settings - batch size settings are difficult to follow #7

Comments

zyfone commented Apr 9, 2024

justinkay commented Apr 9, 2024 • edited Loading

zyfone commented Apr 9, 2024

justinkay commented Apr 9, 2024 •

edited

Loading