This repository is the official implementation of GOOD: Exploring geometric cues for detecting objects in an open world (ICLR 2023).
We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators.
As we can see from the following figure, geometric cues are much more generalizable across different categories, and can effectively narrow the generalization gap between base (known) and novel (unknown) categories. Our method has achieved SOTA results on many open-world detection benchmarks including COCO Person to non-Person, VOC to non-VOC, LVIS COCO to non-COCO, and COCO to UVO.
As shown in the following figure, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. The top-ranked pseudo boxes are added to the annotation pool for Phase II training, i.e., a class-agnostic object detector is directly trained on the RGB input using both the base class and pseudo annotations. At inference time, we only need the model from Phase II.
You can download pretrained weights here:
Training | Eval | url | OLN AR_N@100 | GOOD AR_N@100 |
---|---|---|---|---|
Person, COCO | Non-Person, COCO | Pseudo-box/GOOD | 16.5 | 26.2 |
VOC, COCO | Non-VOC, COCO | Pseudo-box/GOOD | 33.2 | 39.3 |
COCO | Non-COCO, LVIS | Pseudo-box/GOOD | 27.4 | 29.0 |
For all GOOD models, we find the optimal number k for pseudo labels is 1. Due to some modifications of the evaluation code, the numbers are slightly different from the papers.
This repository is based on mmdetection and OLN.
You can use following commands to create conda env with related dependencies.
conda create -n good python=3.8 -y
conda activate good
conda install pytorch=1.7.0 torchvision -c pytorch
conda install cuda -c nvidia
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install -v -e .
Please refer to Omnidata repositories for the pretrained models. We provide an example code for extracting depth and normal here. Please put it in the same repository as Omnidata repository to use it.
To train the Phase-I model, run this command:
python tools/train_good.py configs/good/phase1_depth.py
After training, you can run this command to extract pseudo labels and generate a COCO-format annotation file:
python tools/test_extract_proposals.py configs/good/phase1_depth.py path-to-checkpoint/latest.pth --eval bbox --modality depth --out path-to-save-pseudo-box-json
To train the Phase-II model, run this command:
python tools/train_good.py configs/good/phase2_good.py
Note the difference of config files from Phase-I. You need to specify the filenames of the pseudo boxes in the config file.
To evaluate the model, run:
python tools/test_good.py configs/good/phase2_good.py path-to-checkpoint/latest.pth --eval bbox
@inproceedings{
huang2023good,
title={{GOOD}: Exploring geometric cues for detecting objects in an open world},
author={Haiwen Huang and Andreas Geiger and Dan Zhang},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=W-nZDQyuy8D}
}
This code repository is open-sourced under MIT license.
For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.