This is the official repository for the paper: GeoFormer: A Multi-Polygon Segmentation Transformer presented at the British Machine Vision Conference 2024 in Glasgow.
GeoFormer is designed to predict the set of vertices that encapsulate buildings in an image, eliminating the need for additional post-processing by directly modeling the polygon vertices.
If you find this work useful, please consider citing our paper:
@article{khomiakov2024geoformer,
title={GeoFormer: A Multi-Polygon Segmentation Transformer},
author={Khomiakov, Maxim and Andersen, Michael Riis and Frellsen, Jes},
journal={arXiv preprint arXiv:2411.16616},
year={2024}
}
Developed with Python 3.8.6
pip install -r requirements.txt
- We rely on Weights & Biases for logging purposes, ensure to
wandb login
prior running the training or inference scripts.
Download the AiCrowd Mapping Challenge dataset, and extract to the folder ./data/aicrowd_mapping_challenge/<train|val>
. Alternatively modify the data paths found in ./config/dataset/aicrowd.yaml
Then simply run:
python train.py
Adapt relevant arguments in ./config/inference.yaml
if necessary and run:
- To generate inference samples:
python inference.py meta.task='produce_inference_samples'
- Compute COCO evals:
python inference.py meta.task='compute_metrics'
The trained model checkpoint is available for download here.
We would like to thank the authors of the influential prior work upon which this work is built, including: HEAT: Holistic Edge Attention Transformer for Structured Reconstruction, PolygonRNN++ as well as the frameworks of x-transformers and pytorch image models.