A neural network architecture for realtime simultaneous face detection, landmark localization, pose estimation and gender recognition.
This work was inspired by the following two publicationss:
- HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Annotated Facial Landmarks in the Wild (AFLW)
Like in Faster-RCNN, the Region Proposal Network (RPN) generates candidate proposals for face regions. These are backprojected to the earlier layers as in HyperFace. The feature maps from these layers are then merged and the all the tasks(detection, landmark localization, pose estimation, gender recognition) are performed on the merged feature vector.
The above images show the heat-maps of the final RPN layer for different anchor sizes.
The RPN proposals are backprojected on intermediate layers of VGGNet. These features are visualized here.
- For single face
- For multiple faces
Some examples of annotated images from AFLW datasets.
- Red bounding boxes show female gender and blue boxes show male.
- Yellow bounding boxes are the ones proposed by the RPN. Improved bounding boxes shown in red/blue are generated after applying bounding-box-reggression.
We have achieved a frame rate of 4FPS on Nvidia Geforce GTX TITAN X GPU.