This repo is a PyTorch implementation of applying MogaNet to semantic segmentation with Semantic FPN and UperNet on ADE20K. The code is based on MMSegmentation. For more details, see Efficient Multi-order Gated Aggregation Network (arXiv 2022).
Please note that we simply follow the hyper-parameters of PVT, Swin, and VAN, which may not be the optimal ones for MogaNet. Feel free to tune the hyper-parameters to get better performance.
Install MMSegmentation from souce code, or follow the following steps. This experiment uses MMSegmentation>=0.19.0, and we reproduced the results with MMSegmentation v0.29.1 and Pytorch==1.10.
pip install openmim
mim install mmcv-full
pip install mmseg
Note: Since we write MogaNet backbone code of detection, segmentation, and pose estimation in the same file, it also works for MMDetection and MMPose through @BACKBONES.register_module()
. Please continue to install MMDetection or MMPose for further usage.
Prepare ADE20K according to the guidelines in MMSegmentation. Please use the 2016 version of ADE20K dataset, which can be downloaded from ADEChallengeData2016 or Baidu Cloud (7ycz).
Notes: All the models are evaluated at a single scale (SS), you can modify test_pipeline
in config files to evaluate the multi-scale performance (MS). The trained models can also be downloaded by Baidu Cloud (z8mf) at MogaNet/ADE20K_Segmentation
. The params (M) and FLOPs (G) are measured by get_flops with 2048
python get_flops.py /path/to/config --shape 2048 512
Method | Backbone | Pretrain | Params | FLOPs | Iters | mIoU | mAcc | Config | Download |
---|---|---|---|---|---|---|---|---|---|
Semantic FPN | MogaNet-XT | ImageNet-1K | 6.9M | 101.4G | 80K | 40.3 | 52.4 | config | log / model |
Semantic FPN | MogaNet-T | ImageNet-1K | 9.1M | 107.8G | 80K | 43.1 | 55.4 | config | log / model |
Semantic FPN | MogaNet-S | ImageNet-1K | 29.1M | 189.7G | 80K | 47.7 | 59.8 | config | log / model |
Semantic FPN | MogaNet-B | ImageNet-1K | 47.5M | 293.6G | 80K | 49.3 | 61.6 | config | log / model |
Semantic FPN | MogaNet-L | ImageNet-1K | 86.2M | 418.7G | 80K | 50.2 | 63.0 | config | log / model |
Method | Backbone | Pretrain | Params | FLOPs | Iters | mIoU | mAcc | Config | Download |
---|---|---|---|---|---|---|---|---|---|
UperNet | MogaNet-XT | ImageNet-1K | 30.4M | 855.7G | 160K | 42.2 | 55.1 | config | log / model |
UperNet | MogaNet-T | ImageNet-1K | 33.1M | 862.4G | 160K | 43.7 | 57.1 | config | log / model |
UperNet | MogaNet-S | ImageNet-1K | 55.3M | 946.4G | 160K | 49.2 | 61.6 | config | log / model |
UperNet | MogaNet-B | ImageNet-1K | 73.7M | 1050.4G | 160K | 50.1 | 63.4 | config | log / model |
UperNet | MogaNet-L | ImageNet-1K | 113.2M | 1176.1G | 160K | 50.9 | 63.5 | config | log / model |
We train the model on a single node with 8 GPUs by default (a batch size of 32 / 16 for Semantic FPN / UperNet). Start training with the config as:
PORT=29001 bash dist_train.sh /path/to/config 8
To evaluate the trained model on a single node with 8 GPUs, run:
bash dist_test.sh /path/to/config /path/to/checkpoint 8 --out results.pkl --eval mIoU
If you find this repository helpful, please consider citing:
@article{Li2022MogaNet,
title={Efficient Multi-order Gated Aggregation Network},
author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
journal={ArXiv},
year={2022},
volume={abs/2211.03295}
}
Our segmentation implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.