Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter settings - batch size settings are difficult to follow #7

Closed
zyfone opened this issue Apr 9, 2024 · 2 comments
Closed

Comments

@zyfone
Copy link

zyfone commented Apr 9, 2024

Thank you for your valuable contributions to the DAOD community, particularly in the fair benchmark construction and comparison. I have some suggestions regarding the hyperparameter batch size.

The batch size in the paper is notable (48 samples) compared to existing methods[#1,#2,#3], whose batch size is 1 or 2 for both the source and target domains. This benchmark places greater demands on the GPUs and may prove challenging to follow. In contrast to previous work, the small batch size allows more researchers to join the DAOD community.

Reference:
#1 SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation
#2 Strong-Weak Distribution Alignment for Adaptive Object Detection
#3 Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain Adaptation in Object Detection

@justinkay
Copy link
Owner

justinkay commented Apr 9, 2024

Hi @zyfone thank you for your question and interest, we are glad you find it valuable!

Our codebase should be able to accommodate such GPU constraints without any changes:

For the batch size parameter, we interpret this slightly differently than a typical Detectron2 implementation in order to try and address concerns about different batch sizes between codebases. We use two batch size config values, SOLVER.IMS_PER_BATCH (the usual Detectron2 param) and SOLVER.IMS_PER_GPU. In ALDI, we interpret SOLVER.IMS_PER_BATCH as the total batch size across all GPUs across all datasets including both source and target. We use gradient accumulation to get there no matter how small SOLVER.IMS_PER_GPU and no matter how many/few GPUs you are using.

So it would be entirely possible to use a batch size of 2 on just one GPU and run our code out of the box; what would happen is that gradient accumulation would take place such that 24 forward/backward passes would take place before optimizer.step() is called. In this way our results can be reproduced regardless of GPU hardware.

For what it's worth, a batch size of 48 matches other related work such as AT: https://github.com/facebookresearch/adaptive_teacher , but this is not immediately clear from their config values. If you look at their training code they are actually using gradient accumulation to run 3 forward passes (one for source, one for target, one for feature alignment) before optimizer.step(), so while their config indicates the batch size is 16 it is actually 16*3=48.

Thanks for your interest and please feel free to follow up if this is not clear!

@zyfone
Copy link
Author

zyfone commented Apr 9, 2024

Thanks for you quick reply. (๑¯ω¯๑)

@zyfone zyfone closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants