Dissertation Defense - University of Houston
Skip to main content

Dissertation Defense

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Pengfei Dou

will defend his dissertation

Single/Multi-View 3D Face Reconstruction for Pose-Robust Face Recognition


Abstract

One of the most difficult challenges in automated face recognition is matching facial images acquired at different views. The pose variation in facial images causes not only misalignment between different images but also inconsistency in facial appearance. Successful solutions to these two issues will greatly improve the applicability of face recognition to different scenarios. To resolve the issue of misalignment between images, three-dimensional information, being a strong prior invariant to view perspectives, has been demonstrated beneficial. Three dimensional data has been widely employed as an intermediate media for pose normalization or synthesis to amend for the misalignment between facial images caused by pose variation. In these works, one crucial step is acquiring the personalized 3D face which, ideally, can be captured with a 3D camera. However, the high cost and limited effective sensing range of 3D cameras have constrained their applicability in practical deployment. To resolve the issue of inconsistent facial appearance, discriminative feature learning has been demonstrated promising in previous works. With a large set of labeled training data, deep neural networks can be trained effectively for discriminative facial feature extraction. However, it is costly to collect and manually annotate a large training database that covers the full pose variation. This dissertation conducts extensive study on these two issues and offers several contributions to solve them by single/multi-view 3D facial shape reconstruction for facial pose normalization and pose-robust facial signature generation for pose invariant face matching. The first contribution is a framework for reconstructing the 3D face from a single facial image. The proposed framework, two-fold coupled structure learning (UH-2FCSL), consists of a regression module based on subspace learning and partial least-squares regression for recovering a sparse 3D facial shape from 2D facial landmarks and a 3D super-resolution module for reconstructing the dense 3D facial shape from the estimated sparse shape. Extensive experiments on multiple public databases have demonstrated the superior performance of UH-2FCSL over state-of-the-art algorithms in reconstruction accuracy and robustness against facial pose and illumination variation. The second contribution is a deep neural network based end-to-end framework for 3D face reconstruction (UH-E2FAR) from a single facial image with large pose variation. By employing a deep convolutional neural network (DCNN) for robust feature extraction, a shallow convolutional neural network for multi-scale feature fusion, and fully-connected layers for 3D facial shape subspace model parameter estimation, the proposed framework significantly improves the reconstruction accuracy over state-of-the-art. Compared to UH-2FCSL, the proposed UH-E2FAR does not require facial landmarks, and thus is more robust to large facial pose, which often degrades the landmark detection accuracy. The Third contribution is an algorithm for extracting and matching pose-robust facial signature (PRFS) for pose invariant face recognition. By combining discriminative feature learning and part-based face representation, the proposed algorithm enhances the extracted facial features with estimated self-occlusion encodings and creates facial signatures that are pose-aware. During facial signature matching, the self-occlusion encodings are explicitly used to weight the similarity score computation. The fourth contribution is a framework UH-DRFAR for 3D facial shape reconstruction from a set of multi-view facial images of the subject. The proposed framework extends UH-E2FAR by integrating a recurrent neural network with two stacked long short-term memory (LSTM) layers to aggregate and fuse the contextual identity signal from the set of facial images. The aggregated identity signal from the last LSTM layer is used to estimate the 3D facial shape subspace model parameter for reconstructing the 3D face. Compared to single-view 3D face reconstruction, the proposed method improves the reconstruction accuracy for large pose facial images. Compared to photometric stereo, the proposed method is more robust to facial pose and illumination variation and is not sensitive to the number of images available. The fifth contribution is an evaluation of how single/multi-view 3D face reconstruction impacts the performance of 3D-aided face recognition. In this study, the UR2D face recognition pipeline is extended to template-based face recognition by integrating a feature fusion module. Extensive experiments are conducted on a challenging benchmark to evaluate the face recognition performances with respect to both single-view and multi-view face reconstruction.


Date: Monday, September 25, 2017
Time: 11:00 AM
Place: HBS 314
Advisor: Dr. Ioannis A. Kakadiaris

Faculty, students, and the general public are invited.