arXiv | IEEE Xplore | Website | Video
This repository is the official implementation of the paper:
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
Niclas Vödisch*, Kürsat Petek*, Markus Käppeler*, Abhinav Valada, and Wolfram Burgard.
*Equal contribution.IEEE Robotics and Automation Letters, vol. 10, issue 1, pp. 216-223, January 2025
If you find our work useful, please consider citing our paper:
@article{voedisch2025pastel,
author={Vödisch, Niclas and Petek, Kürsat and Käppeler, Markus and Valada, Abhinav and Burgard, Wolfram},
journal={IEEE Robotics and Automation Letters},
title={A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation},
year={2025},
volume={10},
number={1},
pages={216-223},
}
Make sure to also check out our previous work on this topic: SPINO.
A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We underline the relevance of our approach by employing PASTEL to important robot perception use cases from autonomous driving and agricultural robotics. In extensive experiments, we demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotation.
- Create conda environment:
conda create --name pastel python=3.8
- Activate environment:
conda activate pastel
- Install dependencies:
pip install -r requirements.txt
- Install torch, torchvision and cuda:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
- Install pre-commit githook scripts:
pre-commit install
- Upgrade isort to 5.12.0:
pip install isort
- Update [pre-commit]:
pre-commit autoupdate
Linter (pylint) and formatter (yapf, iSort) settings can be set in pyproject.toml.
Generating pseudo-labels with PASTEL involves three steps:
- Train the semantic segmentation module.
- Train the boundary estimation module.
- Generate pseudo-labels using the fusion module.
For Cityscapes, an exemplary execution would look like this:
conda activate pastel
python semantic_fine_tuning.py fit --trainer.devices [0] --config configs/cityscapes_semantics.yaml
python boundary_fine_tuning.py fit --trainer.devices [0] --config configs/cityscapes_boundary.yaml
python instance_clustering.py test --trainer.devices [0,1,2,3] --config configs/cityscapes_instance_ncut.yaml
We provide configuration files for each step of all datasets in the configs
folder. Please make sure to double-check the paths to the datasets and the pretrained weights.
We provide the following pre-trained weights:
- Cityscapes:
- PASCAL VOC:
- PhenoBench:
Download the following files:
- leftImg8bit_sequence_trainvaltest.zip (324GB)
- gtFine_trainvaltest.zip (241MB)
- camera_trainvaltest.zip (2MB)
After extraction, one should obtain the following file structure:
── data/cityscapes
├── camera
│ └── ...
├── gtFine
│ └── ...
└── leftImg8bit_sequence
└── ...
- We use the 2012 challenge plus the SBD extension.
- Upon execution, the files should be automatically downloaded from torchvision.
Afterward, one should obtain the following file structure:
── data/pascal_voc
├── SBD
│ └── ...
└── VOCdevkit/VOC2012
└── ...
- We use the leaf instance segmentation challenge.
- Please download the dataset from the official website.
After extraction, one should obtain the following file structure:
── data/phenobench
├── test
│ └── images
├── train
│ ├── images
│ ├── leaf_instances
│ ├── leaf_visibility
│ ├── plant_instances
│ ├── plant_visibility
│ └── semantics
└── val
├── images
├── leaf_instances
├── leaf_visibility
├── plant_instances
├── plant_visibility
└── semantics
For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.
This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300.