Morghulis is an attempt to create a common API for face datasets. There are many face datasets available. Each of them has its own conventions and annotation format, but at the end, they all consist of a set of images with the respective annotated faces.
To make things worse the existent object detection libraries: Detectron , Tensorflow Object Detection API and Darknet's YOLO, to name a few, all use different formats for train/eval/test. Detectron uses COCO json format, Tensorflow uses tf records, and so on.
Once Morghulis loads a dataset, it can be easily exported to different formats
Currently the following datasets are supported:
- WIDER FACE - 32,203 images and 393,703 faces.
- FDDB - 2,845 images and 5,171 faces.
- AFW - 205 images and 473 faces.
- PASCAL faces - 850 images and 1335 faces.
- MAFA - 30,811 images and 35,806 masked faces.
- Caltech faces - 450 frontal face images of 27 or so unique people
- UFDD - 6,425 images and 10,897 faces
- TODO IJB-C
# Download wider face
docker run --rm -it \
-v ${PWD}/datasets:/datasets \
housebw/morghulis \
./download_dataset.py --dataset widerface --output_dir /datasets/widerface
# Download fddb
docker run --rm -it \
--volumes-from ds \
housebw/morghulis \
./download_dataset.py --dataset fddb --output_dir /ds/fddb/
# Generate TF records for fddb
docker run --rm -it \
--volumes-from ds \
housebw/morghulis \
./export.py --dataset=fddb --format=tensorflow --data_dir=/ds/fddb/ --output_dir=/ds/fddb/tensorflow/
# Generate COCO json files for widerface
docker run --rm -it \
-v ${PWD}/datasets:/ds \
housebw/morghulis \
./export.py --dataset=widerface --format=coco --data_dir=/ds/widerface/ --output_dir=/ds/widerface/coco/
# Generate Darknet training files for widerface
docker run --rm -it \
-v ${PWD}/datasets:/datasets \
housebw/morghulis \
./export.py --dataset=widerface --format=darknet --data_dir=/datasets/widerface/ --output_dir=/datasets/widerface/darknet/
Install with pip
:
pip install git+https://github.com/the-house-of-black-and-white/morghulis.git
Use the dataset object (e.g. Wider
or FDDB
) to download and export to different formats:
data_dir = '/datasets/WIDER'
ds = Wider(data_dir) # FDDB(data_dir)
# downloads train, validation sets and annotations
ds.download()
# generate darknet (YOLO)
ds.export(darknet_output_dir, target_format='darknet')
# generate tensorflow tf records
ds.export(tf_output_dir, target_format='tensorflow')
# generates COCO json file (useful for Detectron)
ds.export(coco_output_dir, target_format='coco')