- 4/1 NVIDIA V100 GPUs for training/evaluation.
- Auto-mixed precision was enabled in training but disabled in evaluation.
- Test-time augmentations were not used.
- The inference resolution was 480p as DeAOT.
- Fully online inference. We passed all the modules frame by frame.
Stages:
-
PRE
: the pre-training stage with static images are the same as DeAOT. -
PRE_YTB_DAV
: the main-training stage with YouTube-VOS and DAVIS.
Model | Param | PRE | PRE_YTB_DAV (LVOS eval checkpoints) | PRE_YTB_DAV (LTV eval checkpoints) | PRE_YTB_DAV (DAVIS eval checkpoints) |
---|---|---|---|---|---|
MAVOS | 34M | gdrive | gdrive | gdrive | gdrive |
R50-MAAVOS | 41M | gdrive | gdrive | gdrive | gdrive |
SwinB-MAVOS | 91M | gdrive | gdrive | gdrive | gdrive |
To use our pre-trained models to infer, a simple way is to set --model
and --ckpt_path
to your downloaded checkpoint's model type and file path when running eval.py
.