Sample source of C3D written in Chainer.
This code is inspired by c3d-keras, nd c3d-pytorch. The original paper is here.
Requirement:
- Chainer 3.2 (It may work on the other versions.)
This code is forked from chainer/examples/cifar@97315f. Please refer to here about license.
Copyright (c) 2017 ikeyasu.
Prerequirement.
- opencv3 with FFmpeg
- youtube-dl (if you need)
$ pip install youtube-dl
$ conda config --add channels conda-forge
$ conda install opencv
Download models, labels and mean image.
- model: Chainer converted model
- labell: Sports-1M Dataset's label
- mean image: ced-keras's file
$ pushd caffe_model
$ wget https://github.com/ikeyasu/c3d-chainer/releases/download/201712124/conv3d_deepnetA_sport1m_iter_1900000_chainer.model
$ wget https://raw.githubusercontent.com/gtoderici/sports-1m-dataset/master/labels.txt
$ wget --content-disposition https://github.com/axon-research/c3d-keras/blob/master/data/train01_16_128_171_mean.npy.bz2?raw=true
$ bunzip2 train01_16_128_171_mean.npy.bz2
$ cd popd
NOTE: If you want to convert caffe model to chainer by yourself, please read caffe_model/README.md.
Then, you can run a prediction.
$ youtube-dl -f mp4 https://www.youtube.com/watch?v=dM06AMFLsrc -o dM06AMFLsrc.mp4
$ python predict.py -a c3d --model caffe_model/conv3d_deepnetA_sport1m_iter_1900000_chainer.model --mean caffe_model/train01_16_128_171_mean.npy --video dM06AMFLsrc.mp4 --labels caffe_model/labels.txt
Loaded 487 labels.
Loaded caffe_model/conv3d_deepnetA_sport1m_iter_1900000_chainer.model.
/home/ikeyasu/anaconda3/envs/chainer3/lib/python3.6/site-packages/chainer/utils/experimental.py:104: FutureWarning: chainer.functions.pooling.MaxPoolingND is experimental. The interface can change in the future.
FutureWarning)
Position of maximum probability: 367
Maximum probability: 10.43649
Corresponding label: basketball
Top 5 probabilities and labels:
10.43649 basketball
8.61597 volleyball
8.46861 streetball
7.04241 roller derby
6.63952 freestyle wrestling
You can refer to tools/README.md for a dataset generation.
We use UCF11 which is small dataset for human activity detection.
You need to download the dataset and convert to jpeg images. Please see also tools/README.md for detail.
Requirement:
- FFmpeg
- GNU Parallel
Download UCF11.
$ wget http://crcv.ucf.edu/data/UCF11_updated_mpg.rar
$ mkdir videos
$ pushd videos
$ unrar e UCF11_updated_mpg.rar
$ popd
Converting to images.
$ ls videos/*.mpg | parallel --no-notice -j8 ./tools/video_to_image.sh {}
Make folders for resized and cropped images.
$ find ./videos/* -name "*.jpg" | parallel --no-notice -j 200 'echo `dirname {}`' | uniq > dirs
$ cat dirs | sed 's/videos/hollywood2_160x120\/images/' | xargs mkdir -p
$ cat dirs | sed 's/videos/hollywood2_112px\/images/' | xargs mkdir -p
$ rm dirs
Resize to 160x120 (half).
$ find ./videos/* -name "*.jpg" | parallel -j20 'convert -resize 160X120! {} `echo {} | sed "s/videos/hollywood2_160x120\/images/"`'
Resize and crop to 112x112 (to compute mean image).
$ find ./videos/* -name "*.jpg" | parallel -j20 'python tools/resize.py -c 112 -i {} -o `echo {} | sed "s/videos/hollywood2_112px\/images/"`'
Now, you have two datasets.
- ./hollywood2_160x120/images: for training
- ./hollywood2_112px/images: for comuputing mean image
Compute mean image for ucf11_112px.
$ pushd hollywood2_112px
$ find . | grep .jpg$ > list
$ python ../tools/compute_mean.py --root . list
$ popd
Some videos have very small number of frames. These videos cannot use for training/testing. The following command counts number of files for each video. Please find videos which have images less than 30, and remove the folders.
$ pushd hollywood2_160x120/images/
$ ls | parallel -j50 'echo `ls -1 {} | wc -l` {}' | sort -n > ../counts
$ mkdir ../ignored
$ grep '^[0-29] ' ../counts | cut -d ' ' -f 2 | xargs -I '{}' mv {} ../ignored
$ popd
Split test dataset. Choise 300 images randomly.
$ pushd hollywood2_112px/images
$ mkdir ../tests
$ ls | shuf | head -n 300 | xargs -I '{}' mv {} ../tests/
$ popd
Now, you have..
- ./hollywood2_160x120/images: for training
- ./hollywood2_160x120/tests: for validaton
- ./hollywood2_160x120/ignored: Small number of frames. Not using.
- ./hollywood2_112px/images: for comuputing mean image
- ./hollywood2_112px/mean.npy: mean image array
$ python train.py -g 0 --arch c3d --batchsize 30 --train-data ucf11_160x120/images/ --test-data ucf11_160x120/tests/ --optimizer sgd --mean ucf11_112px/mean.npy --frames 9
Please refer to python train.py --help
for detail.
The original paper uses 16 frames but this sample uses 9 frames because of my GPU limitation.
Accuracy:
Loss:
By image directory.
$ python predict.py -a c3d --model mlp.model --mean ucf11_112px/mean.npy --image-dir ucf11_160x120/tests/v_jumping_22_03/
By dataset.
$ python predict.py -a c3d --model mlp.model --mean ucf11_112px/mean.npy -i ucf11_160x120/tests