Biometrics Tools

Expression Modeling

Facial expressions are defined as the facial changes in response to internal emotional states, intentions or social communications. Facial expression analysis refers to systems which attempt to analyze and identify facial motions and facial feature changes from visual information in an automated manner. This area can benefit from accomplishments in face detection, tracking and recognition, all in which our team had consider experience. Facial expression analysis involves both measurement of facial motion and expression recognition. As laid out by Tian et. al [Tian05],  the general approach to automatic facial expression analysis (AFEA) consists of three major steps (1) face acquisition, (2) facial data extraction and representation and (3) facial expression recognition. The recent research in automatic facial expression analysis tends to follows certain guidelines [Tian05], among of them; (1) building robust systems to handle head motion, occlusion, illumination changes and low-intensity expressions, (2) using more facial features to recognize more expressions, (3) recognize facial action units and their combinations instead of emotion-specific expressions, (4) recognize spontaneous expressions, and (5) develop fully automatic and real-time systems.

In CVIP Lab, we have developed tools for automated facial landmarks detection, in addition to the extraction of different geometric and appearance-based facial features; this encourages us to employ the framework of CMU-S2 in our expression analysis part. We will relax the assumption of having frontal and expressionless face in the first frame through improving the robustness of our face detector. We will employ 3D model based method for head-pose estimation in order to handle head out-of-plane rotation. Active appearance modeling tools developed in CVIP lab can be used for automatically detect facial landmarks to guide the pose estimation process. Illumination variation can be handled using our spherical harmonics-based appearance modeling. Kalman filter with its variants associated by Particle filters will be implemented for the purpose of face tracking, hence temporal information provided by the given sequence will be employed. Our implementation of Gabor wavelets and Local Binary Patterns can be used to extract appearance-based features for further expression recognition. Bayes theory will be employed to recognize different facial expressions in a hidden markov model framework which encodes temporal information. Our proposed linear combination of Gaussians can be used for training the probabilistic models for different facial action units.

References

[1] Y.L. Tian, T. Kanade, and J.F. Cohn, “Facial Expression Analysis”, Handbook of Face Recognition, S.Z. Li and A.K. Jain, eds., pp. 247-276, Springer, 2005.

[2] J. Cohn, T. Kanade, T. Moriyama, Z. Ambadar, J. Xiao, J. Gao, and H. Imamura. “A comparative study of alternative facs coding algorithms”. Technical Report CMU-RI-TR-02-06, Robotics Institute, Carnegie Mellon University, Pittsburgh, November 2001.

[3] T. Moriyama, T. Kanade, J. Cohn, J. Xiao, Z. Ambadar, J. Gao, and M. Imanura. “Automatic recognition of eye blinking in spontaneously occurring behavior”. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR ’2002), volume 4, pages 78–81, 2002

[4] J. Xiao, T. Kanade, and J. Cohn. “Robust full motion recovery of head by dynamic templates and reregistration techniques”. In Proceedings of International Conference on Automatic Face and Gesture Recognition, pages 163–169, 2002.

Image-Based 3D Facial Reconstruction

Shape-from-Shading provides a mean for shape recovery given a single input image. In principle, SFS is an ill-posed problem. In literature, it has been shown that constraining the shape-from-shading algorithm to specific class of objects can improve the accuracy of the recovered shape. Statistical models of 3D shapes have been commonly used for constraining the facial SFS problem. One of the main challenges that confront SFS algorithms is dealing with arbitrary illumination. Basri and Jacobs [1] proved that images of convex Lambertian object taken under arbitrary illumination conditions can be approximated accurately using low-dimensional linear subspace based on spherical harmonics. Since then, spherical harmonics was incorporated in SFS framework to tackle the problem of illumination.

In CVIP lab, Ahmed and Farag [2] extended Castelan’s coupled statistical model [3] by combining shape, appearance/albedo and spherical harmonics in order to parameterize facial surfaces under arbitrary illumination. We [publication #4] further extended the work of Ahmed and Farag [2] to include 2D shape information in the model. In subsequent work, we [publication #2] decoupled the coupled model of [4] and [2] to obtain a separate model for shape and albedo where the classic brightness constraint in shape-from-shading is approximated using spherical harmonics projection. Next, we [publication #1] cast such models in a regression framework using the Partial Least Squares (PLS) method which uses a few matrix operations for shape reconstruction to provide a computationally efficient alternative to the iterative methods used in [publication #2].

Figure 1 Visualization of SHP images. Leftmost column are input images with some illumination. Second to fourth columns are SHP images. Notice that it encodes the arbitrary illumination of the input face but retains the identity of the subject from the USF database.

 

Figure 2 Error histograms: (a) height error histogram for the PLS approach, (b) height error histogram for the iterative approach (c) surface orientation error histogram for the PLS approach, and (d) surface orientation error histogram for the iterative approach. Note the close similarities between the PLS and iterative approaches.

 

Figure 3 Reconstruction results of both PLS and iterative approaches for three out-of- training samples. First row is the input image. Third and fifth rows are the recovered albedo (texture) and shape, respectively. Last row is the shape error image, i.e, difference between the reconstructed and groundtruth shapes. Second and fourth rows are ground-truth albedo (texture) and shape.

Figure 4 Reconstruction results of the PLS approach on Yale Database samples. The input images are in the first row. The second and third rows are recovered albedo (texture) and shape, respectively.

References

[1] R. Basri and D. W. Jacobs. “Lambertian reflectance and linear subspaces”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1):218–233, 2003.

[2] A. H. Ahmed, A. A. Farag. “A New Statistical Model Combining Shape and Spherical Harmonics Illumination for Face Reconstruction”. ISVC 1: 531-541, 2007.

[3] M. Castelan,W. Smith, and E. Hancock, “A coupled statistical model for face shape recovery from brightness images”, IEEE Transaction on Image Processing, 16:1139–1151, 2007

[4] M. Castelan and J. V. Horebeek,“3d face shape approximation from intensities using partial least squares”, Computer Vision and Pattern Recognition Workshop, 0:1–8, 2008.

Publications

[1]  H. Rara, S. Elhabian, T. Starr, and A. Farag,3D Face Recovery from Intensities of General and Unknown Lighting using Partian Least Squares“, IEEE International Conference on Image Processing (ICIP), Sept. 26-29, 2010, Hong Kong

[2]   H. Rara, S. Elhabian, T. Starr, and A. Farag, “Model-Based Shape Recovery From Single Images Of General And Unknown Lighting  2009 IEEE International Conference on Image Processing (ICIP), Nov. 7 – Nov. 10, 2009, Cairo, Egypt

[3]  H. Rara, S. Elhabian, T. Starr, and A. Farag, “Face reconstruction and recognition using a statistical model combining shape and spherical harmonics IEEE SOUTHEASTCON 2009, March 2009

[4]   H. Rara, S. Elhabian, T. Starr, A. Farag.A Statistical Model Combining Shape and Spherical Harmonics for Face Reconstruction and Recognition. Biomedical Engineering Conference, CIBEC2008. Cairo International, 1:1-4, 18-20 Dec. 2008.

 

Multichannel Face Detection

The multichannel face detection framework is based on the approach of Viola & Jones’04 “Robust Real-Time Face Detection”. This approach is a feature-based approach. These features are simple rectangular features, called Haar features, examples are shown in Fig.1. Each rectangular has dark and light regions. The feature is detected if the subtraction of the average dark-region pixel value from the average light-region pixel value is greater than a threshold. The threshold is selected from training data such that it is low enough that it passes nearly all face examples in a training set. Each feature is scaled and shifted across all possible combinations.  A machine-learning method called AdaBoost is used to select the specific Haar features to use and to set threshold levels. Sometimes this approach generates many false hits, “negative faces”, example shown in Fig.2. To reduce the number of these false faces, we propose to add another detection stage, which is skin-based face detector. Skin color has proven to be a useful and robust cue for face detection, localization and tracking. After we detect the faces, we detect their features i.e., the eyes and the mouth. The face is divided into four equal parts to establish a geometrical constraint of the face.

Evaluation on the benchmark Extended Yale B database:

Ø  Illumination: Average detection rate in sets 1 & 2 > 93% for all poses. This average is degraded to 67% for sets 3 and 4.

  Pose: Average detection rate for poses 1-6 > 86% for all illumination sets. This average is degraded to 70.5% for Poses 7-9. The toughest pose for our detector is # 8.

Fig. 1 Examples of Haar-base Features

 

Figure 2 Example illustrates cases where Haar-based detectors give false faces and how the skin-based detector can remove these false candidates.

 

Fig. 3 Face and features detection examples