DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
Zachary Teed and Jia Deng
@article{teed2021droid,
title={{DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras}},
author={Teed, Zachary and Deng, Jia},
journal={arXiv preprint arXiv:2108.10869},
year={2021}
}
Initial Code Release: This repo currently provides a single GPU implementation of our monocular SLAM system. It also contains demos, training, and evaluation scripts. Stereo, RGB-D, and multi-GPU code will be added on September 7.
To run the code you will need ...
-
Inference: Running the demos will require a GPU with at least 11G of memory.
-
Training: Training requires a GPU with at least 24G of memory. We train on 4 x RTX-3090 GPUs.
- Clone the repo using the
--recursive
flag
git clone --recursive https://github.com/princeton-vl/DROID-SLAM.git
- Creating a new anaconda environment using the provided .yaml file. Use
environment_novis.yaml
to if you do not want to use the visualization
conda env create -f environment.yml
pip install evo --upgrade --no-binary evo
pip install gdown
- Compile the extensions (takes about 10 minutes)
python setup.py install
-
Download the model from google drive: droid.pth
-
Download some sample videos using the provided script.
./tools/download_sample_data.sh
Run the demo on any of the samples (all demos can be run on a GPU with 11G of memory). While running, press the "s" key to increase the filtering threshold (= more points) and "a" to decrease the filtering threshold (= fewer points).
python demo.py --imagedir=data/abandonedfactory --calib=calib/tartan.txt --stride=2
python demo.py --imagedir=data/sfm_bench/rgb --calib=calib/eth.txt
python demo.py --imagedir=data/Barn --calib=calib/barn.txt --stride=1 --backend_nms=4
python demo.py --imagedir=data/mav0/cam0/data --calib=calib/euroc.txt --t0=150
python demo.py --imagedir=data/rgbd_dataset_freiburg3_cabinet/rgb --calib=calib/tum3.txt
Running on your own data: All you need is a calibration file. Calibration files are in the form
fx fy cx cy [k1 k2 p1 p2 [ k3 [ k4 k5 k6 ]]]
with parameters in brackets optional.
We provide evaluation scripts for TartanAir, EuRoC, and TUM. EuRoC and TUM can be run on a 1080Ti. The TartanAir validation script will require 24G of memory.
Download the EuRoC sequences (ASL format) and put them in datasets/EuRoC
./tools/evaluate_euroc.sh
Download the fr1 sequences from TUM-RGBD and put them in datasets/TUM-RGBD
./tools/evaluate_tum.sh
Download the TartanAir dataset using the script thirdparty/tartanair_tools/download_training.py
and put them in datasets/TartanAir
./tools/validate_tartanair.sh
First download the TartanAir dataset. The download script can be found in thirdparty/tartanair_tools/download_training.py
. You will only need the rgb
and depth
data.
python download_training.py --rgb --depth
You can then run the training script. We use 4x3090 RTX GPUs for training which takes approximatly 1 week. If you use a different number of GPUs, adjust the learning rate accordingly.
Note: On the first training run, covisibility is computed between all pairs of frames. This can take several hours, but the results are cached so that future training runs will start immediately.
python train.py --datapath=<path to tartanair> --gpus=4 --lr=0.00025
Data from TartanAir was used to train our model. We additionally use evaluation tools from evo and tartanair_tools.