Skip to content

Latest commit

 

History

History
301 lines (257 loc) · 15.4 KB

FACE_DETECTION_en.md

File metadata and controls

301 lines (257 loc) · 15.4 KB

English | 简体中文

FaceDetection

Table of Contents

Introduction

The goal of FaceDetection is to provide efficient and high-speed face detection solutions, including cutting-edge and classic models.

Benchmark and Model Zoo

PaddleDetection Supported architectures is shown in the below table, please refer to Algorithm Description for details of the algorithm.

Original Lite 1 NAS 2
BlazeFace
FaceBoxes x

[1] Lite edition means reduces the number of network layers and channels.
[2] NAS edition means use Neural Architecture Search algorithm to optimized network structure.

Model Zoo

mAP in WIDER FACE

Architecture Type Size Img/gpu Lr schd Easy Set Medium Set Hard Set Download Configs
BlazeFace Original 640 8 32w 0.915 0.892 0.797 model config
BlazeFace Lite 640 8 32w 0.909 0.885 0.781 model config
BlazeFace NAS 640 8 32w 0.837 0.807 0.658 model config
BlazeFace NAS_V2 640 8 32W 0.870 0.837 0.685 model config
FaceBoxes Original 640 8 32w 0.878 0.851 0.576 model config
FaceBoxes Lite 640 8 32w 0.901 0.875 0.760 model config

NOTES:

  • Get mAP in Easy/Medium/Hard Set by multi-scale evaluation in tools/face_eval.py. For details can refer to Evaluation.
  • BlazeFace-Lite Training and Testing ues blazeface.yml configs file and set lite_edition: true.

mAP in FDDB

Architecture Type Size DistROC ContROC
BlazeFace Original 640 0.992 0.762
BlazeFace Lite 640 0.990 0.756
BlazeFace NAS 640 0.981 0.741
FaceBoxes Original 640 0.987 0.736
FaceBoxes Lite 640 0.988 0.751

NOTES:

  • Get mAP by multi-scale evaluation on the FDDB dataset. For details can refer to Evaluation.

Infer Time and Model Size comparison

Architecture Type Size P4(trt32) (ms) CPU (ms) Qualcomm SnapDragon 855(armv8) (ms) Model size (MB)
BlazeFace Original 128 1.387 23.461 6.036 0.777
BlazeFace Lite 128 1.323 12.802 6.193 0.68
BlazeFace NAS 128 1.03 6.714 2.7152 0.234
FaceBoxes Original 128 3.144 14.972 19.2196 3.6
FaceBoxes Lite 128 2.295 11.276 8.5278 2
BlazeFace Original 320 3.01 132.408 70.6916 0.777
BlazeFace Lite 320 2.535 69.964 69.9438 0.68
BlazeFace NAS 320 2.392 36.962 39.8086 0.234
FaceBoxes Original 320 7.556 84.531 52.1022 3.6
FaceBoxes Lite 320 18.605 78.862 59.8996 2
BlazeFace Original 640 8.885 519.364 149.896 0.777
BlazeFace Lite 640 6.988 284.13 149.902 0.68
BlazeFace NAS 640 7.448 142.91 69.8266 0.234
FaceBoxes Original 640 78.201 394.043 169.877 3.6
FaceBoxes Lite 640 59.47 313.683 139.918 2

NOTES:

  • CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz.
  • P4(trt32) and CPU tests based on PaddlePaddle, PaddlePaddle version is 1.6.1.
  • ARM test environment:
    • Qualcomm SnapDragon 855(armv8);
    • Single thread;
    • Paddle-Lite version 2.0.0.

Quick Start

Data Pipline

We use the WIDER FACE dataset to carry out the training and testing of the model, the official website gives detailed data introduction.

  • WIDER Face data source:
    Loads wider_face type dataset with directory structures like this:

    dataset/wider_face/
    ├── wider_face_split
    │   ├── wider_face_train_bbx_gt.txt
    │   ├── wider_face_val_bbx_gt.txt
    ├── WIDER_train
    │   ├── images
    │   │   ├── 0--Parade
    │   │   │   ├── 0_Parade_marchingband_1_100.jpg
    │   │   │   ├── 0_Parade_marchingband_1_381.jpg
    │   │   │   │   ...
    │   │   ├── 10--People_Marching
    │   │   │   ...
    ├── WIDER_val
    │   ├── images
    │   │   ├── 0--Parade
    │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
    │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
    │   │   │   │   ...
    │   │   ├── 10--People_Marching
    │   │   │   ...
    
  • Download dataset manually:
    To download the WIDER FACE dataset, run the following commands:

cd dataset/wider_face && ./download.sh
  • Download dataset automatically: If a training session is started but the dataset is not setup properly (e.g, not found in dataset/wider_face), PaddleDetection can automatically download them from WIDER FACE dataset, the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered automatically subsequently.

Data Augmentation

  • Data-anchor-sampling: Randomly transform the scale of the image to a certain range of scales, greatly enhancing the scale change of the face. The specific operation is to obtain $v=\sqrt{width * height}$ according to the randomly selected face height and width, and judge the value of v in which interval of [16,32,64,128]. Assuming v=45 && 32<v<64, and any value of [16,32,64] is selected with a probability of uniform distribution. If 64 is selected, the face's interval is selected in [64 / 2, min(v * 2, 64 * 2)].

  • Other methods: Including RandomDistort,ExpandImage,RandomInterpImage,RandomFlipImage etc. Please refer to READER.md for details.

Training and Inference

Training and Inference please refer to GETTING_STARTED.md
NOTES:

  • BlazeFace and FaceBoxes is trained in 4 GPU with batch_size=8 per gpu (total batch size as 32) and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters in the table of calculation rules).
  • Currently we do not support evaluation in training.

Evaluation

Currently we support evaluation on the WIDER FACE dataset and the FDDB dataset. First run tools / face_eval.py to generate the evaluation result file, and then use matlab(WIDER FACE) or OpenCV(FDDB) calculates specific evaluation indicators.
Among them, the optional arguments list for running tools / face_eval.py is as follows:

  • -f or --output_eval: Evaluation file directory, default is output/pred.
  • -e or --eval_mode: Evaluation mode, include widerface and fddb, default is widerface.
  • --multi_scale: If you add this action button in the command, it will select multi_scale evaluation. Default is False, it will select single-scale evaluation.

Evaluate on the WIDER FACE

  • Evaluate and generate results files:
export CUDA_VISIBLE_DEVICES=0
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
       -o weights=output/blazeface/model_final \
       --eval_mode=widerface

After the evaluation is completed, the test result in txt format will be generated in output/pred.

  • Download the official evaluation script to evaluate the AP metrics:
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
  • Modify the result path and the name of the curve to be drawn in eval_tools/wider_eval.m:
# Modify the folder name where the result is stored.
pred_dir = './pred';  
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
  • wider_eval.m is the main execution program of the evaluation module. The run command is as follows:
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"

Evaluate on the FDDB

We provide a FDDB data set evaluation process (currently only supports Linux systems), please refer to FDDB official website for other specific details.

./dataset/fddb/download.sh
  • 3)Compile FDDB evaluation code: Go to the dataset/fddb/evaluation directory and modify the contents of the MakeFile file as follows:
evaluate: $(OBJS)
    $(CC) $(OBJS) -o $@ $(LIBS)

Modify the content in common.hpp to the following form:

#define __IMAGE_FORMAT__ ".jpg"
//#define __IMAGE_FORMAT__ ".ppm"
#define __CVLOADIMAGE_WORKING__

According to the grep -r "CV_RGB" command, find the code segment containing CV_RGB, change CV_RGB to Scalar, and add using namespace cv; in cpp, then compile:

make clean && make
  • 4)Start evaluation:
    Modify the contents of the dataset_dir and annotation fields in the config file:
EvalReader:
  ...
  dataset:
    dataset_dir: dataset/fddb
    anno_path: FDDB-folds/fddb_annotFile.txt
    ...

Evaluate and generate results files:

python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
       -o weights=output/blazeface/model_final \
       --eval_mode=fddb

After the evaluation is completed, the test result in txt format will be generated in output/pred/pred_fddb_res.txt.
Generate ContROC and DiscROC data:

cd dataset/fddb/evaluation
./evaluate -a ./FDDB-folds/fddb_annotFile.txt \
           -f 0 -i ./ -l ./FDDB-folds/filePath.txt -z .jpg \
           -d {RESULT_FILE} \
           -r {OUTPUT_DIR}

NOTES:
(1)RESULT_FILE is the FDDB prediction result file output by tools/face_eval.py;
(2)OUTPUT_DIR is the prefix of the FDDB evaluation output file, which will generate two files {OUTPUT_DIR}ContROC.txt{OUTPUT_DIR}DiscROC.txt;
(3)The interpretation of the argument can be performed by ./evaluate --help.

Algorithm Description

BlazeFace

Introduction:
BlazeFace is Google Research published face detection model. It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices.

Particularity:

  • Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
  • 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
  • Replace the non-maximum suppression algorithm with a blending strategy that estimates the regression parameters of a bounding box as a weighted mean between the overlapping predictions.

Edition information:

  • Original: Reference original paper reproduction.
  • Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
  • NAS: use Neural Architecture Search algorithm to optimized network structure, less network layer and conv channel number than Lite.
  • NAS_V2: this version of model architecture searched based on blazeface-NAS by the SANAS in PaddleSlim, the average precision is 3% higher than blazeface-NAS, the latency is only 5% higher than blazeface-NAS on chip 855.

FaceBoxes

Introduction:
FaceBoxes which named A CPU Real-time Face Detector with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on both speed and accuracy. This paper is published by IJCB(2017).

Particularity:

  • Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640, including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
  • 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
  • Use density prior box to improve detection accuracy.

Edition information:

  • Original: Reference original paper reproduction.
  • Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU. Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution. The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.

Contributing

Contributions are highly welcomed and we would really appreciate your feedback!!