A feature renderer for robust 3D feature point representation in camera relocalization.
Webpage
·
Report Bug
·
Request Feature
This is the codebase accompanying the paper FaVoR: Features via Voxel Rendering for Camera Relocalization by Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, and Jonathan Kelly. Visit the project webpage for an overview.
If you use this code, please cite the following publication:
@misc{polizzi2024arXiv,
title = {FaVoR: Features via Voxel Rendering for Camera Relocalization},
author = {Vincenzo Polizzi and Marco Cannici and Davide Scaramuzza and Jonathan Kelly},
year = {2024},
eprint = {2409.07571},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2409.07571},
}
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.
- OS: Ubuntu 22.04
- GPU: RTX 4060 or higher
- Docker (Optional): For containerized environments
- NVIDIA Container Toolkit (if using Docker)
If you choose to run our code using Docker, make sure you have Docker installed and the NVIDIA Container Toolkit, and you can skip the Requirements and Setup section and go directly to the Dataset Download section.
git clone --recursive https://github.com/utiasSTARS/FaVoR.git
cd FaVoR
Note: If you forget --recursive
, initialize submodules manually:
git submodule update --init --recursive
Create a Conda environment and install dependencies: Note: if you want to use docker you can skip these passages and go directly to the Datasets Download section.
conda create -n favor python=3.10
conda activate favor
conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
Install PyTorch and dependencies:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install torch-scatter==2.1.1 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install -r requirements.txt
cd lib/cuda
./build.sh
We used the 7-Scenes and Cambridge Landmarks datasets for our experiments. You need to download the dataset to run the code. Run the script to download datasets:
bash scripts/download_datasets.sh
To download a specific scene:
bash scripts/download_datasets.sh SCENE_NAME
Note: the script will create the folder datasets
and download the datasets,
the NetVLAD matches, and
the COLMAP ground truth for the 7-Scenes dataset.
Scenes Available:
- 7-Scenes: chess, fire, heads, office, pumpkin, redkitchen, stairs
- Cambridge Landmarks: college, court, hospital, shop, church
Ensure Docker and NVIDIA Container Toolkit are installed. Run the visualizer:
xhost +local:docker
docker run --net=host --rm -v ./logs/:/favor/logs -v ./datasets/:/favor/datasets --privileged --gpus all -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it viciopoli/favor:latest bash /favor/scripts/visualizer.sh SCENE_NAME
Replace SCENE_NAME
with one from the dataset list above.
conda activate favor
bash scripts/visualize.sh SCENE_NAME
Replace SCENE_NAME
with one from the dataset list above.
Run test scripts to reproduce results:
conda activate favor
bash scripts/test_7scenes.sh NETWORK_NAME
bash scripts/test_cambridge.sh NETWORK_NAME
Replace NETWORK_NAME
with one of: alike-l
, alike-n
, alike-s
, alike-t
, superpoint
.
To print the results:
python results.py --logs_dir ./logs/7Scenes --dataset 7scenes --net_model alike-l
Modify the --logs_dir
, --dataset
, and --net_model
arguments as needed.
Pretrained models are available on the Hugging Face model hub.
Note: the tests scripts will automatically download the models if needed.
Single models can be downloaded using the Hugging Face CLI:
DATASET=7Scenes # or Cambridge
SCENE=chess # or ShopFacade etc.
NETWORK=alike-l # or alike-s, alike-n, alike-t, superpoint
huggingface-cli download viciopoli/FaVoR $DATASET/$SCENE/$NETWORK/model_ckpts/model_last.tar --local-dir-use-symlinks False --local-dir /path/to/your/directory
DATASET_NAME
├── SCENE_NAME
│ ├── NETWORK_NAME
│ │ ├── model_ckpts
│ │ └── results
...
Example (7-Scenes):
7scenes
├── chess_7scenes
│ ├── alike-n
│ │ ├── models
│ │ └── tracks.pkl
Distributed under the Apache 2.0 License. See LICENSE
for more information.
Built on DVGO.
Template by Best-README-Template.