TSM

简介

@inproceedings{lin2019tsm,
  title={TSM: Temporal Shift Module for Efficient Video Understanding},
  author={Lin, Ji and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2019}
}

@article{NonLocal2018,
  author =   {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
  title =    {Non-local Neural Networks},
  journal =  {CVPR},
  year =     {2018}
}

模型库

Kinetics-400

配置文件	分辨率	GPU 数量	主干网络	预训练	top1 准确率	top5 准确率	参考代码的 top1 准确率	参考代码的 top5 准确率	推理时间 (video/s)	GPU 显存占用 (M)	ckpt	log	json
tsm_r50_1x1x8_50e_kinetics400_rgb	340x256	8	ResNet50	ImageNet	70.24	89.56	70.36	89.49	74.0 (8x1 frames)	7079	ckpt	log	json
tsm_r50_1x1x8_50e_kinetics400_rgb	短边 256	8	ResNet50	ImageNet	70.59	89.52	x	x	x	7079	ckpt	log	json
tsm_r50_1x1x8_50e_kinetics400_rgb	短边 320	8	ResNet50	ImageNet	70.73	89.81	x	x	x	7079	ckpt	log	json
tsm_r50_1x1x8_100e_kinetics400_rgb	短边 320	8	ResNet50	ImageNet	71.90	90.03	x	x	x	7079	ckpt	log	json
tsm_r50_gpu_normalize_1x1x8_50e_kinetics400_rgb.py	短边 256	8	ResNet50	ImageNet	70.48	89.40	x	x	x	7076	ckpt	log	json
tsm_r50_video_1x1x8_50e_kinetics400_rgb	短边 256	8	ResNet50	ImageNet	70.25	89.66	70.36	89.49	74.0 (8x1 frames)	7077	ckpt	log	json
tsm_r50_dense_1x1x8_50e_kinetics400_rgb	短边 320	8	ResNet50	ImageNet	73.46	90.84	x	x	x	7079	ckpt	log	json
tsm_r50_dense_1x1x8_100e_kinetics400_rgb	短边 320	8	ResNet50	ImageNet	74.55	91.74	x	x	x	7079	ckpt	log	json
tsm_r50_1x1x16_50e_kinetics400_rgb	340x256	8	ResNet50	ImageNet	72.09	90.37	70.67	89.98	47.0 (16x1 frames)	10404	ckpt	log	json
tsm_r50_1x1x16_50e_kinetics400_rgb	短边 256	8x4	ResNet50	ImageNet	71.89	90.73	x	x	x	10398	ckpt	log	json
tsm_r50_1x1x16_100e_kinetics400_rgb	短边 320	8	ResNet50	ImageNet	72.80	90.75	x	x	x	10398	ckpt	log	json
tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb	短边 320	8x4	ResNet50	ImageNet	72.03	90.25	71.81	90.36	x	8931	ckpt	log	json
tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb	短边 320	8x4	ResNet50	ImageNet	70.70	89.90	x	x	x	10125	ckpt	log	json
tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb	短边 320	8x4	ResNet50	ImageNet	71.60	90.34	x	x	x	8358	ckpt	log	json
tsm_mobilenetv2_dense_1x1x8_100e_kinetics400_rgb	短边 320	8	MobileNetV2	ImageNet	68.46	88.64	x	x	x	3385	ckpt	log	json

Diving48

配置文件	GPU 数量	主干网络	预训练	top1 准确率	top5 准确率	GPU 显存占用 (M)	ckpt	log	json
tsm_r50_video_1x1x8_50e_diving48_rgb	8	ResNet50	ImageNet	75.99	97.16	7070	ckpt	log	json
tsm_r50_video_1x1x16_50e_diving48_rgb	8	ResNet50	ImageNet	81.62	97.66	7070	ckpt	log	json

Something-Something V1

配置文件	分辨率	GPU 数量	主干网络	预训练	top1 准确率 (efficient/accurate)	top5 准确率 (efficient/accurate)	参考代码的 top1 准确率 (efficient/accurate)	参考代码的 top5 准确率 (efficient/accurate)	GPU 显存占用 (M)	ckpt	log	json
tsm_r50_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	45.58 / 47.70	75.02 / 76.12	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_flip_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	47.10 / 48.51	76.02 / 77.56	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_randaugment_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	47.16 / 48.90	76.07 / 77.92	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_ptv_randaugment_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	47.65 / 48.66	76.67 / 77.41	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_ptv_augmix_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	46.26 / 47.68	75.92 / 76.49	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_flip_randaugment_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	47.85 / 50.31	76.78 / 78.18	45.50 / 47.33	74.34 / 76.60	7077	ckpt	log	json
tsm_r50_1x1x16_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	47.62 / 49.28	76.63 / 77.82	47.05 / 48.61	76.40 / 77.96	10390	ckpt	log	json
tsm_r101_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	45.72 / 48.43	74.67 / 76.72	46.64 / 48.13	75.40 / 77.31	9800	ckpt	log	json

Something-Something V2

配置文件	分辨率	GPU 数量	主干网络	预训练	top1 准确率 (efficient/accurate)	top5 准确率 (efficient/accurate)	参考代码的 top1 准确率 (efficient/accurate)	参考代码的 top5 准确率 (efficient/accurate)	GPU 显存占用 (M)	ckpt	log	json
tsm_r50_1x1x8_50e_sthv2_rgb	高 240	8	ResNet50	ImageNet	57.86 / 61.12	84.67 / 86.26	57.98 / 60.69	84.57 / 86.28	7069	ckpt	log	json
tsm_r50_1x1x8_50e_sthv2_rgb	高 256	8	ResNet50	ImageNet	60.79 / 63.84	86.60 / 88.30	xx / 61.2	xx / xx	7069	ckpt	log	json
tsm_r50_1x1x16_50e_sthv2_rgb	高 240	8	ResNet50	ImageNet	59.93 / 62.04	86.10 / 87.35	58.90 / 60.98	85.29 / 86.60	10400	ckpt	log	json
tsm_r50_1x1x16_50e_sthv2_rgb	高 256	8	ResNet50	ImageNet	61.06 / 63.19	86.66 / 87.93	xx / 63.1	xx / xx	10400	ckpt	log	json
tsm_r101_1x1x8_50e_sthv2_rgb	高 240	8	ResNet101	ImageNet	58.59 / 61.51	85.07 / 86.90	58.89 / 61.36	85.14 / 87.00	9784	ckpt	log	json

Diving48

配置文件	GPU 数量	主干网络	预训练	top1 准确率	top5 准确率	GPU 显存占用 (M)	ckpt	log	json
tsm_r50_video_1x1x8_50e_diving48_rgb	8	ResNet50	ImageNet	75.99	97.16	7070	ckpt	log	json
tsm_r50_video_1x1x16_50e_diving48_rgb	8	ResNet50	ImageNet	81.62	97.66	7070	ckpt	log	json

MixUp & CutMix on Something-Something V1

配置文件	分辨率	GPU 数量	主干网络	预训练	top1 准确率 (efficient/accurate)	top5 准确率 (efficient/accurate)	top1 准确率变化 (efficient/accurate)	top5 准确率变化 (efficient/accurate)	ckpt	log	json
tsm_r50_mixup_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	46.35 / 48.49	75.07 / 76.88	+0.77 / +0.79	+0.05 / +0.70	ckpt	log	json
tsm_r50_cutmix_1x1x8_50e_sthv1_rgb	高 100	8	ResNet50	ImageNet	45.92 / 47.46	75.23 / 76.71	+0.34 / -0.24	+0.21 / +0.59	ckpt	log	json

Jester

配置文件	分辨率	GPU 数量	主干网络	预训练	top1 准确率 (efficient/accurate)	ckpt	log	json
tsm_r50_1x1x8_50e_jester_rgb	高 100	8	ResNet50	ImageNet	96.5 / 97.2	ckpt	log	json

HMDB51

配置文件	GPU 数量	主干网络	预训练	top1 准确率	top5 准确率	GPU 显存占用 (M)	ckpt	log	json
tsm_k400_pretrained_r50_1x1x8_25e_hmdb51_rgb	8	ResNet50	Kinetics400	72.68	92.03	10388	ckpt	log	json
tsm_k400_pretrained_r50_1x1x16_25e_hmdb51_rgb	8	ResNet50	Kinetics400	74.77	93.86	10388	ckpt	log	json

UCF101

配置文件	GPU 数量	主干网络	预训练	top1 准确率	top5 准确率	GPU 显存占用 (M)	ckpt	log	json
tsm_k400_pretrained_r50_1x1x8_25e_ucf101_rgb	8	ResNet50	Kinetics400	94.50	99.58	10389	ckpt	log	json
tsm_k400_pretrained_r50_1x1x16_25e_ucf101_rgb	8	ResNet50	Kinetics400	94.58	99.37	10389	ckpt	log	json

注：

这里的 GPU 数量 指的是得到模型权重文件对应的 GPU 个数。默认地，MMAction2 所提供的配置文件对应使用 8 块 GPU 进行训练的情况。依据线性缩放规则，当用户使用不同数量的 GPU 或者每块 GPU 处理不同视频个数时，需要根据批大小等比例地调节学习率。如，lr=0.01 对应 4 GPUs x 2 video/gpu，以及 lr=0.08 对应 16 GPUs x 4 video/gpu。
这里的 推理时间 是根据基准测试脚本获得的，采用测试时的采帧策略，且只考虑模型的推理时间，并不包括 IO 时间以及预处理时间。对于每个配置，MMAction2 使用 1 块 GPU 并设置批大小（每块 GPU 处理的视频个数）为 1 来计算推理时间。
参考代码的结果是通过使用相同的模型配置在原来的代码库上训练得到的。对应的模型权重文件可从这里下载。
对于 Something-Something 数据集，有两种测试方案：efficient（对应 center crop x 1 clip）和 accurate（对应 Three crop x 2 clip）。两种方案参考自原始代码库。 MMAction2 使用 efficient 方案作为配置文件中的默认选择，用户可以通过以下方式转变为 accurate 方案：

...
test_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,   # 当使用 8 个 视频段时，设置 `num_clips = 8`
        twice_sample=True,    # 设置 `twice_sample=True` 用于 accurate 方案中的 Twice Sample
        test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    # dict(type='CenterCrop', crop_size=224), 用于 efficient 方案
    dict(type='ThreeCrop', crop_size=256),  # 用于 accurate 方案
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]

当采用 Mixup 和 CutMix 的数据增强时，使用超参 alpha=0.2。
我们使用的 Kinetics400 验证集包含 19796 个视频，用户可以从验证集视频下载这些视频。同时也提供了对应的数据列表（每行格式为：视频 ID，视频帧数目，类别序号）以及标签映射（类别序号到类别名称）。

对于数据集准备的细节，用户可参考数据集准备文档中的 Kinetics400, Something-Something V1 and Something-Something V2 部分。

如何训练

用户可以使用以下指令进行模型训练。

python tools/train.py ${CONFIG_FILE} [optional arguments]

例如：以一个确定性的训练方式，辅以定期的验证过程进行 TSM 模型在 Kinetics-400 数据集上的训练。

python tools/train.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \
    --work-dir work_dirs/tsm_r50_1x1x8_100e_kinetics400_rgb \
    --validate --seed 0 --deterministic

更多训练细节，可参考基础教程中的 训练配置 部分。

如何测试

用户可以使用以下指令进行模型测试。

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

例如：在 Kinetics-400 数据集上测试 TSM 模型，并将结果导出为一个 json 文件。

python tools/test.py configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py \
    checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
    --out result.json

更多测试细节，可参考基础教程中的 测试某个数据集 部分。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_zh-CN.md

README_zh-CN.md

TSM

简介

模型库

Kinetics-400

Diving48

Something-Something V1

Something-Something V2

Diving48

MixUp & CutMix on Something-Something V1

Jester

HMDB51

UCF101

如何训练

如何测试

Files

README_zh-CN.md

Latest commit

History

README_zh-CN.md

File metadata and controls

TSM

简介

模型库

Kinetics-400

Diving48

Something-Something V1

Something-Something V2

Diving48

MixUp & CutMix on Something-Something V1

Jester

HMDB51

UCF101

如何训练

如何测试