This readme contains the instructions for downloading, extracting and cleaning the dataset used in our Pose-DRL paper. It also provides the steps to compute pose predictions, deep features and instance features required for our Pose-DRL agent. All pose predictions are pre-computed to speed-up RL training.
Main requirement is Matlab2015a and a CUDA-enabled GPU for computing MubyNet features and feature maps (see details below). If you do not want to compute these features yourself (e.g. if you have got them from elsewhere), then most versions of Matlab should be ok (i.e. training the Pose-DRL view selection policy does not in itself require a particular Matlab version). You also need Python 3.6 for downloading and pre-processing the Panoptic data.
Pose-DRL assumes that the dataset and the associated cache of pose predictions are provided using a certain structure.
There are two important folders:
- The data folder that contains the raw images and Panoptic annotations (used for evaluating the 3d error).
- The cache folder that contains the MubyNet predictions, deep features and instance features used for matching.
Internally these two folders mirror each other. They assume a second level of folders that separate the scenes into train, test and validation splits.
Example:
data/
train/
scene1/
val/
scene2/
test/
scene3/
cache/
train/
scene1/
val/
scene2/
test/
scene3/
You may split the data/
and cache/
folder into different drives.
- Create your data folder (e.g.
data/
) somewhere. Do not create any subfolders. - Create your cache folder (e.g.
cache/
) somewhere. Then createcache/test
,cache/train
,cache/val
. - Go into the
load_config.m
file and set thedataset_path
anddataset_cache
flags to the corresponding paths (they should point todata/
andcache/
, respectively).
- Download the scenes and extract annotations and images:
./downloadAll.sh <data-folder-path>
- Clean-up the scene to remove bad camera frames and ensure that each scene has a fixed number of persons:
python3 clean_up_panoptic.py --check-same-nbr-frames --check-green-frame --hd --same-people --split-scene --min-con 100 --delete --min-nbr-people 1 <data-folder-path>
- Split the data into the official train, validation and test split by running:
bash split_data.sh <data-folder-path>
.
- Download the pre-computed train detections, val detections and test detections (zip files).
- Unzip the associated zip-files.
- Move all the
detections.h5
files to the top-level of the respective scene folders of the Panoptic data you downloaded in step "1. Download and clean Panoptic data" above.
Note: You may of course use any detector you want and compute detections yourself. Just make sure to save each scene's detections into a h5-file in the format matching the given pre-computed ones.
- Set
CONFIG.predictor_disabled = 0;
andCONFIG.predictor_precache = 0;
inload_config.m
. - Download weights to
models/mubynet/
. - For each split (train, val, test) compute the 3d pose predictions and deep features.
run_generate_mubynet_cache('train')
run_generate_mubynet_cache('val')
run_generate_mubynet_cache('test')
Next we want to generate the instance features used for matching people in the scene by appearance. The model is first trained for 40k iterations on the training split, then fine-tuned 2k iterations for each individual scene.
The base of the instance features comes from a VGG-19 model.
- Download the VGG-19 weights (sha1: 7e1441c412647bebdf7ae9750c0c9aba131a1601).
- Either run the 40k base training on the train split or download the weights.
- Train weights from scratch using the Matlab script:
run_train_instance_detector('train')
- Download pre-trained 40k iteration weights (sha1: 6727771807b0984f2f3bbed2cf4e0a2af80b396f).
- Train weights from scratch using the Matlab script:
- Generate the fine-tuned weights for each split:
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'train', '<path-to-vgg19-weights>', 2000)
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'val', '<path-to-vgg19-weights>', 2000)
run_generate_finetuned_instance_cache('<path-40k-instance-weights>', 'test', '<path-to-vgg19-weights>', 2000)
Below follows in-depth information about the dataset and the scripts.
downloadAll.sh
downloads, extracts and, verifies all scenes.downloadScene.sh
downloads a scene with all videos and annoationsextractScene.sh
extract images from the videos, removes videos and extracts the annotations. The videos frames are subsampled by a provided frequency. The annotations are then pruned to match the frames. Finally any Coco19 annotations are converted to MPII for all scenes, if they exists.subsample.sh
removes all but every n:th file in a directory.vgaImgsExtractor.sh
/hdImgsExtractor.sh
extract image frames from the video then calls subsample on the resulting frames.verifyScene.sh
checks the content of the dataset.clean_up_dataset.py
removes bad frames and frames missing annotationsdiscard_annotations.py
removes annotations to match the subsampled frames.coco2mpii.py
converts the coco19 annotations to the MPII 15 joint format.
- Panoptic dataset scripts adapted from https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox
- We received MubyNet code directly from the authors. Thanks!