Skip to content

A globally sparse but locally dense 3D feature renderer for camera relocalization.

License

Notifications You must be signed in to change notification settings

utiasSTARS/FaVoR

Repository files navigation


FaVoR: Features via Voxel Rendering for Camera Relocalization

A feature renderer for robust 3D feature point representation in camera relocalization.
Webpage · Report Bug · Request Feature

Contributors Forks Stargazers Issues Apache 2.0 License

About

demo

This is the codebase accompanying the paper FaVoR: Features via Voxel Rendering for Camera Relocalization by Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, and Jonathan Kelly. Visit the project webpage for an overview.

If you use this code, please cite the following publication:

@misc{polizzi2024arXiv,
    title = {FaVoR: Features via Voxel Rendering for Camera Relocalization},
    author = {Vincenzo Polizzi and Marco Cannici and Davide Scaramuzza and Jonathan Kelly},
    year = {2024},
    eprint = {2409.07571},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV},
    url = {https://arxiv.org/abs/2409.07571},
}

Abstract

Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.

(back to top)

Getting Started

Prerequisites

If you choose to run our code using Docker, make sure you have Docker installed and the NVIDIA Container Toolkit, and you can skip the Requirements and Setup section and go directly to the Dataset Download section.

Installation

1. Clone the Repository

git clone --recursive https://github.com/utiasSTARS/FaVoR.git
cd FaVoR

Note: If you forget --recursive, initialize submodules manually:

git submodule update --init --recursive

2. Set Up Environment

Create a Conda environment and install dependencies: Note: if you want to use docker you can skip these passages and go directly to the Datasets Download section.

conda create -n favor python=3.10
conda activate favor
conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit

Install PyTorch and dependencies:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install torch-scatter==2.1.1 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install -r requirements.txt

3. Build CUDA Modules

cd lib/cuda
./build.sh

(back to top)

Datasets Download

We used the 7-Scenes and Cambridge Landmarks datasets for our experiments. You need to download the dataset to run the code. Run the script to download datasets:

bash scripts/download_datasets.sh

To download a specific scene:

bash scripts/download_datasets.sh SCENE_NAME

Note: the script will create the folder datasets and download the datasets, the NetVLAD matches, and the COLMAP ground truth for the 7-Scenes dataset.

Scenes Available:

  • 7-Scenes: chess, fire, heads, office, pumpkin, redkitchen, stairs
  • Cambridge Landmarks: college, court, hospital, shop, church

(back to top)

Running FaVoR

Docker (Recommended)

Ensure Docker and NVIDIA Container Toolkit are installed. Run the visualizer:

xhost +local:docker
docker run --net=host --rm -v ./logs/:/favor/logs -v ./datasets/:/favor/datasets --privileged --gpus all -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it viciopoli/favor:latest bash /favor/scripts/visualizer.sh SCENE_NAME

Replace SCENE_NAME with one from the dataset list above.

Run Locally

conda activate favor
bash scripts/visualize.sh SCENE_NAME

Replace SCENE_NAME with one from the dataset list above.

(back to top)

Reproduce Results

Run test scripts to reproduce results:

conda activate favor
bash scripts/test_7scenes.sh NETWORK_NAME
bash scripts/test_cambridge.sh NETWORK_NAME

Replace NETWORK_NAME with one of: alike-l, alike-n, alike-s, alike-t, superpoint.

To print the results:

python results.py --logs_dir ./logs/7Scenes --dataset 7scenes --net_model alike-l

Modify the --logs_dir, --dataset, and --net_model arguments as needed.

Pretrained Models

Pretrained models are available on the Hugging Face model hub.

Note: the tests scripts will automatically download the models if needed.

Single models can be downloaded using the Hugging Face CLI:

DATASET=7Scenes # or Cambridge
SCENE=chess # or ShopFacade etc.
NETWORK=alike-l # or alike-s, alike-n, alike-t, superpoint
huggingface-cli download viciopoli/FaVoR $DATASET/$SCENE/$NETWORK/model_ckpts/model_last.tar --local-dir-use-symlinks False --local-dir /path/to/your/directory

Logs Structure

DATASET_NAME
    ├── SCENE_NAME
    │   ├── NETWORK_NAME
    │   │   ├── model_ckpts
    │   │   └── results
    ...

Example (7-Scenes):

7scenes
    ├── chess_7scenes
    │   ├── alike-n
    │   │   ├── models
    │   │   └── tracks.pkl

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

(back to top)

Acknowledgments

Built on DVGO.

Template by Best-README-Template.

(back to top)

About

A globally sparse but locally dense 3D feature renderer for camera relocalization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published