Skip to content

CVPR2023 (highlight) - A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

License

Notifications You must be signed in to change notification settings

hzwer/CVPR2023-DMVFN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

This project is the implementation of our Paper: A Dynamic Multi-Scale Voxel Flow Network for Video Prediction, which is accepted by CVPR2023 (highlight✨, 10% of accepted papers). We proposed a SOTA model for Video Prediction.

Poster | 研究历程 | 中文论文 | rebuttal (3WA->1WA2SA) | Demo

Comparison image

supplement and correction of published paper: 1. In Figure 2(a), $I_{t-1}$ and $I_t$ should be swapped; 2. "Our model is trained on four 2080Ti GPUs for 300 epochs, which takes about 35 hours." This sentence refers to the Cityscapes training set; 3. In Table 1, GFLOPs computing for KITTI is on $256\times832$ and on $1024\times512$ for Cityscapes using fvcore.

Usage

Installation

git clone https://github.com/megvii-research/CVPR2023-DMVFN.git
cd CVPR2023-DMVFN
pip3 install -r requirements.txt
  • Download the pretrained models from Google Drive. (百度网盘 password:33ly), and move the pretrained parameters to CVPR2023-DMVFN/pretrained_models/*
pip install gdown
mkdir pretrained_models && cd pretrained_models
gdown --id 1jILbS8Gm4E5Xx4tDCPZh_7rId0eo8r9W
gdown --id 1WrV30prRiS4hWOQBnVPUxdaTlp9XxmVK
gdown --id 14_xQ3Yl3mO89hr28hbcQW3h63lLrcYY0
cd ..

Data Preparation

In this section, we will download all parts including training and testing sets. If you only need the test set, please jump to Directly download test splits.

The final data folder CVPR2023-DMVFN/data/ should be organized like this:

data
├── Cityscapes
│   ├── train (Citysapes images in 512x1024)
│   │   └── 000000
│   │   └── ...
│   │   └── 002974
│   ├── test (Citysapes images in 512x1024)
│   │   └── 000000
│   │   └── ...
│   │   └── 000499
├── KITTI
│   ├── train (Kitti images in 256x832)
│   │   └── 000000
│   │   └── ...
│   │   └── 013499
│   ├── test (Kitti images in 256x832)
│   │   └── 000000
│   │   └── ...
│   │   └── 001336
├── UCF101
│   ├── v_ApplyEyeMakeup_g08_c01
│   └── ...
└── vimeo_interp_test
    └── target
        └── 00001
        └── ...

Cityscapes

  • Download the Cityscapes dataset leftImg8bit_sequence_trainvaltest.zip from here.

  • Unzip leftImg8bit_sequence_trainvaltest.zip.

unzip leftImg8bit_sequence_trainvaltest.zip
  • Run ./utils/prepare_city.py
python3 ./utils/prepare_city.py

KITTI

Videos for training split:
2011_09_26_drive_0001_sync  2011_09_26_drive_0018_sync  2011_09_26_drive_0104_sync 2011_09_26_drive_0002_sync  2011_09_26_drive_0048_sync
2011_09_26_drive_0106_sync  2011_09_26_drive_0005_sync  2011_09_26_drive_0051_sync 2011_09_26_drive_0113_sync  2011_09_26_drive_0009_sync  
2011_09_26_drive_0056_sync  2011_09_26_drive_0117_sync  2011_09_26_drive_0011_sync 2011_09_26_drive_0057_sync  2011_09_28_drive_0001_sync
2011_09_26_drive_0013_sync  2011_09_26_drive_0059_sync  2011_09_28_drive_0002_sync 2011_09_26_drive_0014_sync  2011_09_26_drive_0091_sync 
2011_09_29_drive_0026_sync  2011_09_26_drive_0017_sync  2011_09_26_drive_0095_sync 2011_09_29_drive_0071_sync

Videos for testing split:
2011_09_26_drive_0060_sync  2011_09_26_drive_0084_sync  2011_09_26_drive_0093_sync  2011_09_26_drive_0096_sync
  • Unzip all files, then reorganize the dataset as follows:
mkdir train_or
unzip 2011_09_26_drive_0001_sync
mv 2011_09_26_drive_0001_sync/image02/data/ train_or/2011_09_26_drive_0001_sync

Do this command for all files. We use image02 and image03 for training.

  • Run ./utils/prepare_kitti.py.
python3 ./utils/prepare_kitti.py
  • Do the same for the testing split.

UCF101

We extract RGB frames from each video in UCF101 dataset and save as .jpg image.

Download the preprocessed data directly from feichtenhofer/twostreamfusion for convenience.

wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.003

cat ucf101_jpegs_256.zip* > ucf101_jpegs_256.zip
unzip ucf101_jpegs_256.zip

Vimeo90K

  • Download Vimeo90K dataset directly from here.
  • Unzip the dataset.

Run

😆Training

For Cityscapes Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset CityTrainDataset \
--val_datasets CityValDataset \
--batch_size 8 \
--num_gpu 8

For KITTI Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset KittiTrainDataset \
--val_datasets KittiValDataset \
--batch_size 8 \
--num_gpu 8

For DAVIS and Vimeo Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset UCF101TrainDataset \
--val_datasets DavisValDataset VimeoValDataset \
--batch_size 8 \
--num_gpu 8

🤔️Testing

Directly download test splits of different datasets

Download Cityscapes_test directly from Google Drive.(百度网盘 password: wk7k)

Download KITTI_test directly from Google Drive.(百度网盘 password: e7da)

Download DAVIS_test directly from Google Drive.(百度网盘 password: mczk)

Download Vimeo_test directly from Google Drive.(百度网盘 password: 0mjo)

Run the following command to generate test results of DMVFN model. The --val_datasets can be CityValDataset, KittiValDataset, DavisValDataset, and VimeoValDataset. --save_image can be disabled.

python3 ./scripts/test.py \
--val_datasets CityValDataset [optional: KittiValDataset, DavisValDataset, VimeoValDataset] \
--load_path path_of_pretrained_weights \
--save_image 

Image results

We provide the image results of DMVFN on various datasets (Cityscapes, KITTI, DAVIS and Vimeo) in 百度网盘 (password: k7eb).

We also provide the results of DMVFN (without routing) in 百度网盘 (password: 8zo9).

Test the image results

Run the following command to directly test the image results.

python3 ./scripts/test_ssim_lpips.py

😋Single test

We provide a simple code to predict a t+1 image with t-1 and t images. Please run the following command:

python3 ./scripts/single_test.py \
--image_0_path ./images/sample_img_0.png \
--image_1_path ./images/sample_img_1.png \
--load_path path_of_pretrained_weights \
--output_dir pred.png

Recommend

We sincerely recommend some related papers:

ECCV22 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

CVPR22 - Optimizing Video Prediction via Video Frame Interpolation

Citation

If you think this project is helpful, please feel free to leave a star or cite our paper:

@inproceedings{hu2023dmvfn,
  title={A Dynamic Multi-Scale Voxel Flow Network for Video Prediction},
  author={Hu, Xiaotao and Huang, Zhewei and Huang, Ailin and Xu, Jun and Zhou, Shuchang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}