@inproceedings{gu2018ava,
title={Ava: A video dataset of spatio-temporally localized atomic visual actions},
author={Gu, Chunhui and Sun, Chen and Ross, David A and Vondrick, Carl and Pantofaru, Caroline and Li, Yeqing and Vijayanarasimhan, Sudheendra and Toderici, George and Ricco, Susanna and Sukthankar, Rahul and others},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={6047--6056},
year={2018}
}
@article{duan2020omni,
title={Omni-sourced Webly-supervised Learning for Video Recognition},
author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
journal={arXiv preprint arXiv:2003.13042},
year={2020}
}
@inproceedings{feichtenhofer2019slowfast,
title={Slowfast networks for video recognition},
author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={6202--6211},
year={2019}
}
Model | Modality | Pretrained | Backbone | Input | gpus | Resolution | mAP | log | json | ckpt |
---|---|---|---|---|---|---|---|---|---|---|
slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | short-side 256 | 20.1 | log | json | ckpt |
slowonly_omnisource_pretrained_r50_4x16x1_20e_ava_rgb | RGB | OmniSource | ResNet50 | 4x16 | 8 | short-side 256 | 21.8 | log | json | ckpt |
slowonly_nl_kinetics_pretrained_r50_4x16x1_10e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 4x16 | 8 | short-side 256 | 21.75 | log | json | ckpt |
slowonly_nl_kinetics_pretrained_r50_8x8x1_10e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 8x8 | 8x2 | short-side 256 | 23.79 | log | json | ckpt |
slowonly_kinetics_pretrained_r101_8x8x1_20e_ava_rgb | RGB | Kinetics-400 | ResNet101 | 8x8 | 8x2 | short-side 256 | 24.6 | log | json | ckpt |
slowonly_omnisource_pretrained_r101_8x8x1_20e_ava_rgb | RGB | OmniSource | ResNet101 | 8x8 | 8x2 | short-side 256 | 25.9 | log | json | ckpt |
slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 24.4 | log | json | ckpt |
slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 25.4 | log | json | ckpt |
slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8x2 | short-side 256 | 25.5 | log | json | ckpt |
Model | Modality | Pretrained | Backbone | Input | gpus | mAP | log | json | ckpt |
---|---|---|---|---|---|---|---|---|---|
slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.1 | log | json | ckpt |
slowfast_temporal_max_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.4 | log | json | ckpt |
slowfast_temporal_max_focal_alpha3_gamma1_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb | RGB | Kinetics-400 | ResNet50 | 32x2 | 8 | 26.8 | log | json | ckpt |
- Notes:
- The gpus indicates the number of gpu we used to get the checkpoint. According to the Linear Scaling Rule, you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
- Context indicates that using both RoI feature and global pooled feature for classification, which leads to around 1% mAP improvement in general.
For more details on data preparation, you can refer to AVA in Data Preparation.
You can use the following command to train a model.
python tools/train.py ${CONFIG_FILE} [optional arguments]
Example: train SlowOnly model on AVA with periodic validation.
python tools/train.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py --validate
For more details and optional arguments infos, you can refer to Training setting part in getting_started .
You can train custom classes from ava. Ava suffers from class imbalance. There are more then 100,000 samples for classes like stand
/listen to (a person)
/talk to (e.g., self, a person, a group)
/watch (a person)
, whereas half of all classes has less than 500 samples. In most cases, training custom classes with fewer samples only will lead to better results.
Three steps to train custom classes:
- Step 1: Select custom classes from original classes, named
custom_classes
. Class0
should not be selected since it is reserved for further usage (to identify whether a proposal is positive or negative, not implemented yet) and will be added automatically. - Step 2: Set
num_classes
. In order to be compatible with current codes, plase make surenum_classes == len(custom_classes) + 1
.- The new class
0
corresponds to original class0
. The new classi
(i > 0) corresponds to original classcustom_classes[i-1]
. - There are three
num_classes
in ava config,model -> roi_head -> bbox_head -> num_classes
,data -> train -> num_classes
anddata -> val -> num_classes
. - If
num_classes <= 5
, input argtopk
ofBBoxHeadAVA
should be modified. The default value oftopk
is(3, 5)
, and all elements oftopk
must be smaller thannum_classes
.
- The new class
- Step 3: Make sure all custom classes are in
label_file
. It is worth mentioning that there are two label files,ava_action_list_v2.1_for_activitynet_2018.pbtxt
(contains 60 classes, 20 classes are missing) andava_action_list_v2.1.pbtxt
(contains all 80 classes).
Take slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb
as an example, training custom classes with AP in range (0.1, 0.3)
, aka [3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]
. Please note that, the previously mentioned AP is calculated by original ckpt, which is trained by all 80 classes. The results are listed as follows.
training classes | mAP(custom classes) | config | log | json | ckpt |
---|---|---|---|---|---|
All 80 classes | 0.1948 | slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb | log | json | ckpt |
custom classes | 0.3311 | slowonly_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes | log | json | ckpt |
All 80 classes | 0.1864 | slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb.py | log | json | ckpt |
custom classes | 0.3785 | slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_custom_classes | log | json | ckpt |
You can use the following command to test a model.
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
Example: test SlowOnly model on AVA and dump the result to a csv file.
python tools/test.py configs/detection/ava/slowonly_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py checkpoints/SOME_CHECKPOINT.pth --eval mAP --out results.csv
For more details and optional arguments infos, you can refer to Test a dataset part in getting_started .