The contents of this repository are as follows
S. No. | Item | Description |
---|---|---|
1 | demo | Contains standalone demo scripts (Quick start, Jupyter Notebook, and Gradio app) to run our AnyLoc-VLAD-DINOv2 method. Also contains guides for APIs. This folder is self-contained (doesn't use anything outside it). |
2 | scripts | Contains all scripts for development. Use the -h option for argument information. |
3 | configs.py | Global configurations for the repository |
4 | utilities | Utility Classes & Functions (includes DINOv2 hooks & VLAD) |
5 | conda-environment.yml | The conda environment (it could fail to install OpenAI CLIP as it includes a git+ URL). We suggest you use the setup_conda.sh script. |
6 | requirements.txt | Requirements file for pip virtual environment. Probably out of date. |
7 | custom_datasets | Custom datalaoder implementations for VPR. |
8 | examples | Miscellaneous example scripts |
9 | MixVPR | Minimal MixVPR inference code |
10 | clip_wrapper.py | A wrapper around two CLIP implementations (OpenAI and OpenCLIP). |
11 | models_mae.py | MAE implementation |
12 | dino_extractor.py | DINO (v1) feature extractor |
13 | CONTRIBUTING.md | Note for contributors |
14 | paper_utils | Paper scripts (formatting for figures, etc.) |
Includes the following repositories (currently not submodules) as subfolders.
Directory | Link | Cloned On | Description |
---|---|---|---|
dvgl-benchmark | gmberton/deep-visual-geo-localization-benchmark | 2023-02-12 | For benchmarking |
datasets-vg | gmberton/datasets_vg | 2023-02-13 | For dataset download and formatting |
CosPlace | gmberton/CosPlace | 2023-03-20 | Baseline Comparisons |
Tip: You can explore the HuggingFace Space and the Colab notebooks (no GPU needed).
Clone this repository
git clone https://github.com/AnyLoc/AnyLoc.git
cd AnyLoc
Set up the conda environment
conda create -n anyloc python=3.8
conda activate anyloc
bash ./setup_conda.sh
You can also use an existing conda environment, say vl-vpr
, by doing
bash ./setup_conda.sh vl-vpr
Note the following:
- All our public release files can be found here.
- If the conda environment setup is taking time, you could just unzip
conda-env.tar.gz
(GB) in your~/anaconda3/envs
folder (but compatibility is not guaranteed).
- If the conda environment setup is taking time, you could just unzip
- The
./scripts
folder is for validating our results and seeing the main scripts. Most applications are in the./demo
folder. See the list of demos before running anything. - If you're running something in the
./scripts
folder, run it withpwd
in this (repository) folder. For example, python scripts are run aspython ./scripts/<script>.py
and bash scripts are run asbash ./scripts/<script>.sh
. For the demos and other baselines, you shouldcd
into respective folders. - The utilities.py file is mainly for
./scripts
files. All demos actually use the demo/utilities.py file (which is distilled and minimal). Using the latter should be enough to implement our SOTA method.
Import the utilities
from utilities import DinoV2ExtractFeatures
from utilities import VLAD
DINOv2 feature extractor can be used as follows
extractor = DinoV2ExtractFeatures("dinov2_vitg14", desc_layer,
desc_facet, device=device)
Get the descriptors using
# Make image patchable (14, 14 patches)
c, h, w = img_pt.shape
h_new, w_new = (h // 14) * 14, (w // 14) * 14
img_pt = tvf.CenterCrop((h_new, w_new))(img_pt)[None, ...]
# Main extraction
ret = extractor(img_pt) # [1, num_patches, desc_dim]
The VLAD aggregator can be loaded with vocabulary (cluster centers) from a c_centers.pt
file.
# Main VLAD object
vlad = VLAD(num_c, desc_dim=None, cache_dir=os.path.dirname(c_centers_file))
vlad.fit(None) # Load the vocabulary (and auto-detect `desc_dim`)
# Cluster centers have shape: [num_c, desc_dim]
# - num_c: number of clusters
# - desc_dim: descriptor dimension
If you have a database of descriptors you want to fit, use
vlad.fit(ein.rearrange(full_db_vlad, "n k d -> (n k) d"))
# n: number of images
# k: number of patches/descriptors per image
# d: descriptor dimension
To get the VLAD representations of multiple images, use
db_vlads: torch.Tensor = vlad.generate_multi(full_db)
# Shape of full_db: [n_db, n_d, d_dim]
# - n_db: number of images in the database
# - n_d: number of descriptors per image
# - d_dim: descriptor dimension
# Shape of db_vlads: [n_db, num_c * d_dim]
# - num_c: number of clusters (centers)
This is present in dino_extractor.py (not a part of demo/utilities.py).
Initialize and use it as follows the extractor
# Import it
from dino_extractor import ViTExtractor
...
# Initialize it (layer and key are when extracting descriptors)
extractor = ViTExtractor("dino_vits8", stride=4,
device=device)
...
# Use it to extract patch descriptors
img = ein.rearrange(img, "c h w -> 1 c h w").to(device)
img = F.interpolate(img, (224, 298)) # For 4:3 images
desc = extractor.extract_descriptors(img,
layer=11, facet="key") # [1, 1, num_descs, d_dim]
...
You don't need to read further if you're not experimentally validating the entire results (enjoy the demos instead). The following sections are for the curious minds who want to reproduce the results.
All the runs were done on a machine with the following specifications:
- CPU: Two Intel Xeon Gold 5317 CPUs (12C24T each)
- CPU RAM: 256 GB
- GPUs: Four NVIDIA RTX 3090 GPUs (24 GB, 10496 CUDA cores each)
- Storage: 3.5 TB HDD on
/scratch
. However, all datasets will take 32+ GB, have more for other requirements (for VLAD cluster centers, caching, models, etc.). We noticed that singularity (with SIF, cache, and tmp) used 90+ GB.- Driver Version (NVIDIA-SMI): 570.47.03
- CUDA (SMI): 11.6
We can use only one GPU; however, some experiments (with large datasets) might need all of the CPU RAM (for efficient/fast nearest neighbor search). Ideally, a 16 GB GPU should also work.
Do the following
- Clone the repository and setup the NVIDIA NGC container (run everything inside it)
- Setup the datasets (download, format, and unzip them)
- Run the script you want to test from scripts folder
Start by cloning/setting up the repository
cd ~/Documents
git clone https://github.com/AnyLoc/AnyLoc.git vl-vpr
Despite using recommended practices of reproducibility (see function seed_everything
in utilities.py) in PyTorch, we noticed minor changes across GPU types and CUDA versions. To mitigate this, we recommend using a singularity container.
Setting up the environment in a singularity container (in a SLURM environment)
TL;DR: Run the following (this system is a different one). This was tested on CMU's Bridges-2 partition of PSC HPC. Don't use this if you want to replicate the tables in the paper (but the numbers come close).
salloc -p GPU-small -t 01:00:00 --ntasks-per-node=5 --gres=gpu:v100-32:1
cd /ocean/containers/ngc/pytorch/
singularity instance start --nv pytorch_22.12-py3.sif vlvpr
singularity run --nv instance://vlvpr
cd /ocean/projects/cis220039p/nkeetha/data/singularity/venv
source vlvpr/bin/activate
cd /ocean/projects/cis220039p/<path to vl-vpr scripts folder>
Main setup: For Singularity on IIITH's Ada HPC (Ubuntu 18.04) - our main setup for validation. Use this if you want to replicate the tables in the paper (hardware should be same as listed before).
The script below assumes that this repository is cloned in ~/Documents/vl-vpr
. That is, this README is at ~/Documents/vl-vpr/README.md
.
# Load the module and configurations
module load u18/singularity-ce/3.9.6
mkdir -p /scratch/$USER/singularity && cd $_ && mkdir .cache .tmp venvs
export SINGULARITY_CACHEDIR=/scratch/$USER/singularity/.cache
export SINGULARITY_TMPDIR=/scratch/$USER/singularity/.tmp
# Ensure that the next command gives output "1" (or anything other than "0")
cat /proc/sys/kernel/unprivileged_userns_clone
# Setup the container (download the image if not there already) - (15 GB cache + 7.5 GB file)
singularity pull ngc_pytorch_22.12-py3 docker://nvcr.io/nvidia/pytorch:22.12-py3
# Test container through shell
singularity shell --nv ngc_pytorch_22.12-py3
# Start and run the container (mount the symlinked and scratch folders)
singularity instance start --mount "type=bind,source=/scratch/$USER,destination=/scratch/$USER" \
--nv ngc_pytorch_22.12-py3 vl-vpr
singularity run --nv instance://vl-vpr
# Create virtual environment
cd ~/Documents/vl-vpr/
pip install virtualenv
cd venvs
virtualenv --system-site-packages vl-vpr
# Activate virtualenv and install all packages
cd ~/Documents/vl-vpr/
source ./venvs/vl-vpr/bin/activate
bash ./setup_virtualenv_ngc.sh
# Run anything you want (from here, but find the file in scripts)
cd ~/Documents/vl-vpr/
python ./scripts/<task name>.py <args>
# The baseline scripts should be run in their own folders. For example, to run CosPlace, do
cd ~/Documents/vl-vpr/
cd CosPlace
python ./<script>.py
Datasets Note: Some datasets are under review (other works) and will be updated soon. See the
Datasets-All
folder in out public material (for.tar.gz
files).
Set them up in a folder with sufficient space
mkdir -p /scratch/$USER/vl-vpr/datasets && cd $_
Download (and unzip) the datasets from here into this folder. Link this folder (for easy access form this repository)
cd ~/Documents/vl-vpr/
cd ./datasets-vg
ln -s /scratch/$USER/vl-vpr/datasets datasets
After setting up all datasets, the folders should look like this (in the dataset folder). Run the following command to get the tree structure.
tree ./eiffel ./hawkins*/ ./laurel_caverns ./VPAir ./test_40_midref_rot*/ ./Oxford_Robotcar ./gardens ./17places ./baidu_datasets ./st_lucia ./pitts30k --filelimit=20 -h
- The
test_40_midref_rot0
isNardo Air
. This is also referred asTartan_GNSS_notrotated
in our scripts. - The
test_40_midref_rot90
isNardo Air-R
(rotated). This is also referred asTartan_GNSS_rotated
in out scripts. - The
hawkins_long_corridor
is the Hawkins dataset (degraded environment). - The
eiffel
dataset isMid-Atlantic Ridge
(underwater dataset).
Output will be something like
./eiffel
├── [4.0K] db_images [65 entries exceeds filelimit, not opening dir]
├── [2.2K] eiffel_gt.npy
└── [4.0K] q_images [101 entries exceeds filelimit, not opening dir]
./hawkins_long_corridor/
├── [4.0K] db_images [127 entries exceeds filelimit, not opening dir]
├── [ 12K] images [314 entries exceeds filelimit, not opening dir]
├── [ 17K] pose_topic_list.npy
└── [4.0K] q_images [118 entries exceeds filelimit, not opening dir]
./laurel_caverns
├── [4.0K] db_images [141 entries exceeds filelimit, not opening dir]
├── [ 20K] images [744 entries exceeds filelimit, not opening dir]
├── [ 41K] pose_topic_list.npy
└── [4.0K] q_images [112 entries exceeds filelimit, not opening dir]
./VPAir
├── [ 677] camera_calibration.yaml
├── [420K] distractors [10000 entries exceeds filelimit, not opening dir]
├── [4.0K] distractors_temp
├── [ 321] License.txt
├── [177K] poses.csv
├── [ 72K] queries [2706 entries exceeds filelimit, not opening dir]
├── [160K] reference_views [2706 entries exceeds filelimit, not opening dir]
├── [ 96K] reference_views_npy [2706 entries exceeds filelimit, not opening dir]
└── [ 82K] vpair_gt.npy
./test_40_midref_rot0/
├── [ 46K] gt_matches.csv
├── [2.8K] network_config_dump.yaml
├── [5.3K] query.csv
├── [4.0K] query_images [71 entries exceeds filelimit, not opening dir]
├── [2.9K] reference.csv
└── [4.0K] reference_images [102 entries exceeds filelimit, not opening dir]
./test_40_midref_rot90/
├── [ 46K] gt_matches.csv
├── [2.8K] network_config_dump.yaml
├── [5.3K] query.csv
├── [4.0K] query_images [71 entries exceeds filelimit, not opening dir]
├── [2.9K] reference.csv
└── [4.0K] reference_images [102 entries exceeds filelimit, not opening dir]
./Oxford_Robotcar
├── [4.0K] __MACOSX
│ └── [4.0K] oxDataPart
├── [4.0K] oxDataPart
│ ├── [4.0K] 1-m [191 entries exceeds filelimit, not opening dir]
│ ├── [ 24K] 1-m.npz
│ ├── [ 13K] 1-m.txt
│ ├── [4.0K] 1-s [191 entries exceeds filelimit, not opening dir]
│ ├── [ 24K] 1-s.npz
│ ├── [4.0K] 1-s-resized [191 entries exceeds filelimit, not opening dir]
│ ├── [ 13K] 1-s.txt
│ ├── [4.0K] 2-s [191 entries exceeds filelimit, not opening dir]
│ ├── [ 24K] 2-s.npz
│ ├── [4.0K] 2-s-resized [191 entries exceeds filelimit, not opening dir]
│ └── [ 13K] 2-s.txt
├── [ 15K] oxdatapart.mat
└── [ 66M] oxdatapart_seg.npz
./gardens
├── [4.0K] day_left [200 entries exceeds filelimit, not opening dir]
├── [4.0K] day_right [200 entries exceeds filelimit, not opening dir]
├── [3.6K] gardens_gt.npy
└── [4.0K] night_right [200 entries exceeds filelimit, not opening dir]
./17places
├── [ 14K] ground_truth_new.npy
├── [ 13K] my_ground_truth_new.npy
├── [ 12K] query [406 entries exceeds filelimit, not opening dir]
├── [ 514] ReadMe.txt
└── [ 12K] ref [406 entries exceeds filelimit, not opening dir]
./baidu_datasets
├── [4.0G] IDL_dataset_cvpr17_3852.zip
├── [387M] mall.pcd
├── [108K] query_gt [2292 entries exceeds filelimit, not opening dir]
├── [ 96K] query_images_undistort [2292 entries exceeds filelimit, not opening dir]
├── [2.7K] readme.txt
├── [ 44K] training_gt [689 entries exceeds filelimit, not opening dir]
└── [ 36K] training_images_undistort [689 entries exceeds filelimit, not opening dir]
./st_lucia
├── [4.0K] images
│ └── [4.0K] test
│ ├── [180K] database [1549 entries exceeds filelimit, not opening dir]
│ └── [184K] queries [1464 entries exceeds filelimit, not opening dir]
└── [695K] map_st_lucia.png
./pitts30k
└── [4.0K] images
├── [4.0K] test
│ ├── [1.2M] database [10000 entries exceeds filelimit, not opening dir]
│ ├── [5.9M] database.npy
│ ├── [864K] queries [6816 entries exceeds filelimit, not opening dir]
│ └── [4.0M] queries.npy
├── [4.0K] train
│ ├── [1.3M] database [10000 entries exceeds filelimit, not opening dir]
│ ├── [5.9M] database.npy
│ ├── [948K] queries [7416 entries exceeds filelimit, not opening dir]
│ └── [4.4M] queries.npy
└── [4.0K] val
├── [1.3M] database [10000 entries exceeds filelimit, not opening dir]
├── [5.8M] database.npy
├── [980K] queries [7608 entries exceeds filelimit, not opening dir]
└── [4.4M] queries.npy
These directories are put under ./datasets_vg/datasets
folder (can store them in scratch and symlink it there). For example, the 17places dataset can be found under ./datasets_vg/datasets/17places
folder.
Original dataset webpages:
- Oxford RobotCar
- St. Lucia (also see other datasets in VPR-Bench)
- Pitts 30k (could also find it here)
- Gardens Point
- 17places (zipped download link)
- Baidu Mall (zipped on dropbox from authors)
- VPAir
- Mid-Atlantic Ridge
Note: We're in the process of releasing
Nardo-Air
(Tartan Air),Laurel Caverns
, andHawkins
(part of SubT-MRS). Please stay tuned!
Some datasets can be found at other places
We thank the authors of the following repositories for their open source code and data:
- VPR Datasets
- gmberton/datasets_vg: Downloading and formatting
- Baselines
- CLIP
- openai/CLIP: Official CLIP implementation
- mlfoundations/open_clip: Open source implementation of CLIP with more checkpoints
- DINO
- SAM
- MAE
- DINOv2
Thanks for using our work. You can cite it as:
@article{AnyLoc,
author = {Nikhil Keetha and Avneesh Mishra and Jay Karhade and Krishna Murthy Jatavallabhula and Sebastian Scherer and Madhava Krishna and Sourav Garg}
title = {AnyLoc: Towards Universal Visual Place Recognition},
url = {https://arxiv.org/abs/2308.00688}
journal = {arXiv},
year = {2023},
}
Developers: