In paper, we use Slowfast R50 + CLIP-B/32 for pretraining (row 3), and fine-tune on single specified benchmark. We release the row 1, 2 and 4 to power practice usage.
Video Enc. | Text Enc. | Pretraining | Fine-tuning | Checkpoints |
---|---|---|---|---|
CLIP-B/32 | CLIP-B/32 | 4M | - | Google Drive |
CLIP-B/32 | CLIP-B/32 | 4M | QVHL + Charades + NLQ + TACoS + ActivityNet + DiDeMo | Google Drive |
Slowfast R50 + CLIP-B/32 | CLIP-B/32 | 4M | - | Google Drive |
Slowfast R50 + CLIP-B/32 | CLIP-B/32 | 4M | QVHL + Charades + NLQ + TACoS + ActivityNet + DiDeMo | Google Drive |
For below downstream tasks, checkpoints are trained by Slowfast R50 + CLIP-B/32 features.
Please follow the instruction here to submit the test set results to Codelab.
Datasets | (MR test) mAP avg | (HD test) HIT@1 | (MR val) mAP avg | (HD val) HIT@1 | Checkpoints + Configs + Prediction + Tensorboard Log |
---|---|---|---|---|---|
QVHL | 35.47 | 60.96 | 36.13 | 61.81 | Google Drive |
QVHL (w/ PT) | 43.63 | 66.28 | 45.44 | 68.77 | Google Drive |
Datasets | R1 @ 0.3 | mIoU | Checkpoints + Configs + Prediction + Tensorboard Log |
---|---|---|---|
NLQ (w/ PT) | 11.74 | 7.88 | Google Drive |
Charades (w/ PT) | 72.63 | 52.17 | Google Drive |
Tacos (w/ PT) | 56.11 | 38.63 | Google Drive |
Datasets | Domain | mAP | Checkpoints + Configs + Prediction |
---|---|---|---|
Youtube (w/ PT) | dog | 74.25 | Google Drive |
Youtube (w/ PT) | gymnastics | 78.89 | Google Drive |
Youtube (w/ PT) | parkour | 74.39 | Google Drive |
Youtube (w/ PT) | skating | 84.87 | Google Drive |
Youtube (w/ PT) | skiing | 75.13 | Google Drive |
Youtube (w/ PT) | surfing | 83.85 | Google Drive |
Datasets | Domain | mAP | Checkpoints + Configs + Prediction + Tensorboard Log |
---|---|---|---|
TVSum (w/ PT) | BK | 91.78 | Google Drive |
TVSum (w/ PT) | BT | 90.47 | Google Drive |
TVSum (w/ PT) | DS | 77.57 | Google Drive |
TVSum (w/ PT) | FM | 74.33 | Google Drive |
TVSum (w/ PT) | GA | 89.78 | Google Drive |
TVSum (w/ PT) | MS | 83.83 | Google Drive |
TVSum (w/ PT) | PK | 82.22 | Google Drive |
TVSum (w/ PT) | PR | 85.81 | Google Drive |
TVSum (w/ PT) | VT | 92.04 | Google Drive |
TVSum (w/ PT) | VU | 77.81 | Google Drive |
Datasets | F1 score | Checkpoints + Configs + Prediction + Tensorboard Log |
---|---|---|
V1 (w/ PT) | 49.85 | Google Drive |
V2 (w/ PT) | 56.97 | 👆 |
V3 (w/ PT) | 59.35 | 👆 |
V4 (w/ PT) | 40.62 | 👆 |