Skip to content

the-house-of-black-and-white/morghulis

Repository files navigation

Morghulis

Morghulis is an attempt to create a common API for face datasets. There are many face datasets available. Each of them has its own conventions and annotation format, but at the end, they all consist of a set of images with the respective annotated faces.

To make things worse the existent object detection libraries: Detectron , Tensorflow Object Detection API and Darknet's YOLO, to name a few, all use different formats for train/eval/test. Detectron uses COCO json format, Tensorflow uses tf records, and so on.

Once Morghulis loads a dataset, it can be easily exported to different formats

Currently the following datasets are supported:

  • WIDER FACE - 32,203 images and 393,703 faces.
  • FDDB - 2,845 images and 5,171 faces.
  • AFW - 205 images and 473 faces.
  • PASCAL faces - 850 images and 1335 faces.
  • MAFA - 30,811 images and 35,806 masked faces.
  • Caltech faces - 450 frontal face images of 27 or so unique people
  • UFDD - 6,425 images and 10,897 faces
  • TODO IJB-C

Usage

Using Docker

# Download wider face
docker run --rm -it \
    -v ${PWD}/datasets:/datasets \
    housebw/morghulis \
    ./download_dataset.py  --dataset widerface --output_dir /datasets/widerface

# Download fddb    
docker run --rm -it \
    --volumes-from ds \
    housebw/morghulis \
    ./download_dataset.py  --dataset fddb --output_dir /ds/fddb/

# Generate TF records for fddb
docker run --rm -it \
    --volumes-from ds \
    housebw/morghulis \
    ./export.py --dataset=fddb --format=tensorflow --data_dir=/ds/fddb/ --output_dir=/ds/fddb/tensorflow/
    
# Generate COCO json files for widerface
docker run --rm -it \
    -v ${PWD}/datasets:/ds \
    housebw/morghulis \
    ./export.py --dataset=widerface --format=coco --data_dir=/ds/widerface/ --output_dir=/ds/widerface/coco/

# Generate Darknet training files for widerface
docker run --rm -it \
    -v ${PWD}/datasets:/datasets \
    housebw/morghulis \
    ./export.py --dataset=widerface --format=darknet --data_dir=/datasets/widerface/ --output_dir=/datasets/widerface/darknet/

Programmatically (as a library)

Install with pip:

pip install git+https://github.com/the-house-of-black-and-white/morghulis.git

Use the dataset object (e.g. Wider or FDDB) to download and export to different formats:

data_dir = '/datasets/WIDER'

ds = Wider(data_dir) # FDDB(data_dir)

# downloads train, validation sets and annotations
ds.download()

# generate darknet (YOLO)
ds.export(darknet_output_dir, target_format='darknet')

# generate tensorflow tf records
ds.export(tf_output_dir, target_format='tensorflow')

# generates COCO json file (useful for Detectron)
ds.export(coco_output_dir, target_format='coco')