Face and Facial Expression Recognition for Identity Authentication
University of Michigan
Time: Jan.2017 - April.2017
Team members: Ping Yu, Haozhu Wang, Bowen Liu, Boyan Sun
Recorded 15s video for training the video and reprocessed data in MATLAB
Built the SVM and CNN model without using any pre-build functions
Unlocked the devices through 3s’ video if specific person with the correct facial expression
The identity recognition accuracy is 99.72% and facial expression recognition accuracy is 99.02%
Abstract: We develop an identity authentication system based on cascaded face and facial expression recognition. This system supports video input for user-friendly training and is able to respond in real time. For the recognition task, we write multiple machine learning programs including SVM and CNN, whose performance is compared. We achieve higher security level with our cascaded system than using face or facial expression alone.
1. Background
People have always been trying to use human faces recognition for security purposes[1]. Many IT companies also try to use face recognition to unlock the device, such as Apple[2], Paypal, and Samsung[3]. A variety of machine learning methods are applied on face recognition successfully, such as Eigenfaces[4], fisher’s linear discriminant [5], convolutional neural-network[6], support vector machines[7] and LDA [8]. However, the reliability of security system based on human face recognition is undermined by the potential similarity of face features among different people[9]. Recently, facial expression recognition has been a hot research topic. Various machine learning methods applied to facial expression recognition, such as multilayer perceptron[10], support vector machine [11], feedforward neural networks [12]. In this paper, we propose a method combining face and facial expression recognition (FFER) for identity authentication. Through introducing the facial expressions recognition as a double check, we achieve higher security level than using face or facial expression alone. To make our system more user-friendly, we develop a video shooting program for taking user’s pictures as dataset. We believe our method is a practical approach towards reliable human face authentication system.
2. Introduction:
we focus on developing an identity authentication system with appropriate machine learning methods. The importance comes from two aspects: First, as electronic devices are being used widely, security and privacy becomes increasingly important; Second, deploying machine learning and computer vision techniques to this real-world task can help us understand the corresponding algorithms better
Our whole project is divided into two parts: “Face Recognition” and “Facial Expression Recognition”. We build different models for both parts and the output of the “Face Recognition” part is used as the input for the “Facial Expression Recognition” part. Several computer vision and machine learning methods are used for feature extraction and classification. The results obtained with different methods are compared.
3. Pipeline
3.1 Preprocessing
To train a good classifier for recognition task, at least hundreds of pictures are needed for each user. However, it’s impractical to ask users to take so many pictures. Thus, we use a video shooting program to take a 15 second short video for each user and the hundreds of face images are cropped from the video frames.
We build the training model for face recognition and facial expression recognition separately. For SVM model, we need extract features first. For CNN model, we could directly input the train image to the network.
For face recognition part and facial expression part, we use two methods to build the model and compare the result in the discussion part. According to the result of face recognition model, we use facial expression model for double checking, which could improve the security.
3.3 Unlocking the Device
Our authentication system use a short video input as the password, if the person shot in the video is the correct user with the required facial expression, the authentication system will send an unlock signal.
4. Support Vector Machine (SVM)
We implemented the Support Vector Machine (SVM) algorithm. The theory details are summarized in appendix. For classifying different faces and facial expressions, we use two different ways. The first way is that we use only one classifier for both face and facial expression, which means that the dataset is divided into m by n labels (for example: we have n different people and each person with m different emotional expressions). In the other way we train two different classifiers with the same dataset: face recognition (multi-class classification: which person it is) and facial expression recognition (two-class classification: smile or not).
4.2 Face Features
Since the features will affect the classification accuracy, we need to find the most suitable features for the image dataset. In this project, we use two different features: HOG[13] feature and EIGEN feature[14], the results based on these two features are compared later.
4.3 the Result for SVM algorithm
Although SVM algorithm is one of the best methods for supervised learning, the number of classifier, the size of face image and different facial features will influence the classification accuracy of face detection and facial expression. In order to optimize the structure and the hyperparameters, we figure out the 4 / 12 relationship between different factors and classification accuracy, the accuracies of different labels are shown in figure 3.
4.4 Discussion for SVM Algorithm
We discuss the details of the results affected by different factors in this part. As we can see from figure 3, the x-axes represents the size of the training set, and y-axes is the accuracy of the classification. The accuracies of all four cases are increased with larger training set, as the small dataset will cause overfitting of the models. The accuracy can be improved with larger face images because we can detect more features with larger face image.
We can analyze the figure by two different size of the face image. For the first case, which the face size is 25*25 pixels, the accuracy of two classifiers is higher than the accuracy of one classifier with small dataset, since the dataset for each class of one classifier is much less than the dataset of two classifiers. However, with the large dataset, the performance of one classifier is better than two classifiers model, since the face size is small in the first case, the feature of one face image is not enough for the classification of the second classifier of two classifier model.
For the second case, which the face size is 150*150 pixels, the accuracy of both models achieve 99% when the dataset is large. As the face image is bigger, we can get more features from the data image. If we do the face recognition with the first classifier and the facial expression recognition with the second classifier, it will achieve better accuracy than classify both the face and facial expression with only one classifier.
5. CNN Method
Convolutional neural network (CNN) was first implemented for image classification by Yann Lecun for written digit recognition in 1997[15]. However, its popularity really turned off in 2012 when researchers from University of Toronto used multi-layer CNN in the imageNet competition and beat other groups by a huge margin [15]. Now we call such a multi-layer CNN network as deep neural network and it has been the foundation of the rapidly growing field of deep learning.
Compared with conventional hand-crafted feature based method such as eigenface and HOG feature method, CNN has the advantage of detecting features automatically. The use of maxpooling layer in CNN further enables the detection of higher dimensional features.
In this project, we trained separate CNN models for face recognition and facial expression recognition. After the initial training, we call these models in later authentication tasks. We use the neural network toolbox provided by MATLAB. All the CNN codes are written and tested with MATLAB 2017a
5.1 Identity Recognition
Identity recognition is a classification task. The neural network reads in an image and predict the label for the image.
We first use our video shooting program to take 250 images for each person of group members. Among the 250 images, 200 images are used for training the networks and the remaining 50 images are used as the test set. The images taken contains both smile expression and other expressions, which will be used in the identity recognition network. We labeled our images with persons’ identities, and use both the persons’ images and the corresponding labels as the input for our face recognition CNN.
We tested several different architectures for the identity recognition task and found that CNN can easily reach a very high accuracy even when the network is rather small. So, to speed up our training and recognition, we chose to only have 2 convolutional layers for the face recognition network, the architecture of the identity recognition network is summarized as figure 4.
We use stochastic gradient descent as the optimizer for the cross entropy loss function, the batch size is chosen as 64, initial learning rate is 0.0001. We trained a total of 120 iterations on our training data, as can be seen from figure 3, the accuracy grew to near 100% at around 25 iterations. Test set prediction accuracy is 99.72%, which is a very satisfying result. We ascribe the high prediction accuracy to the big difference between the face appearances of group members.
The measured response time for a new query image is 0.39 ms, which proves our face recognition network has very fast response.
5.2 Facial Expression Recognition
Our goal for using a facial expression recognition network is to recognize whether a person is smile or not. Similar to face recognition task, facial expression recognition is also a classification task. However, facial expression recognition is much more difficult than the identity recognition task. This can be easily understood by observing the small difference between smile expression and other expression. As shown in figure 3, the similarity between different person is much lower than the similarity between the different facial expressions of the same person. So we expected a need to enlarge our network to improve the prediction capability for facial expression recognition.
We started with the same architecture as the face recognition network for our facial expression recognition network. All the training options are the same as the identity recognition. We expected this network to have a bad performance on this task. However, the results turned out to be surprisingly good, the accuracy reached nearly 100% within 40 iterations. This indicates that even though the facial expression recognition is much harder than the face recognition task, our network is already powerful enough for obtaining a very good performance. We also tested a larger CNN with three convolutional layers, its achievable performance is almost the same with the smaller CNN, however, from figure 7 we can see the larger CNN requires more training iterations to reach the same accuracy as the smaller CNN. Based on this observation, we decided to use the smaller CNN as our recognition network for facial expressions.
figure 7. Comparison of training accuracy versus iterations between small and large CNN model
The test accuracy measured with our test set is 99.02%. The response time of our facial recognition network is 0.48 ms, which is also very fast and can be used for real time recognition system.
CNN is a very powerful technique for classification task. We achieved very high prediction accuracy for identity and facial expression with a convolutional neural network containing 2 convolutional layers. The identity recognition accuracy is 99.72% while the facial expression recognition accuracy is 99.02%. If we cascade these two networks, the probability that our authentication system fails is only (1-99.72%)*(1- 99.02%) = 0.002744%, this failing rate is much lower than that of using either face recognition (1-99.72%) or facial expression recognition (1-99.02%) alone, which proves that our authentication system is very secure
6. Conclusion
We develop an identity authentication system with several computer vision and machine learning techniques. By cascading the face recognition and facial expression recognition, we achieve much higher secure level than using face or facial expression recognition alone.
Reference
[1]. Osuna, Edgar, Robert Freund, and Federico Girosit. "Training support vector machines: an application to face detection." Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on. IEEE, 1997.
[2]. https://9to5mac.com/2017/02/17/opinion-iphone-8-iris-recognition-face-recognition-fingerprint/
[3]. https://arstechnica.com/gadgets/2017/03/video-shows-galaxy-s8-face-recognition-can-be-defeatedwith-a-picture/
[4]. Turk, Matthew A., and Alex P. Pentland. "Face recognition using eigenfaces." Computer Vision and Pattern Recognition, 1991. Proceedings CVPR'91., IEEE Computer Society Conference on. IEEE, 1991.
[5]. Liu, Chengjun, and Harry Wechsler. "Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition." IEEE Transactions on Image processing 11.4 (2002): 467-476.
[6]. Lawrence, Steve, et al. "Face recognition: A convolutional neural-network approach." IEEE transactions on neural networks 8.1 (1997): 98-113.
[7]. Guo, Guodong, Stan Z. Li, and Kapluk Chan. "Face recognition by support vector machines."Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000.
[8]. Lu, Juwei, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos. "Face recognition using LDA-based algorithms." IEEE transactions on neural networks 14.1 (2003): 195-200.
[9]. Zhu, Qiang, et al. "Fast human detection using a cascade of histograms of oriented gradients." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.
[10]. Zhang, Zhengyou, et al. "Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998.
[11]. Kotsia, Irene, and Ioannis Pitas. "Facial expression recognition in image sequences using geometric 9 / 12 deformation features and support vector machines." IEEE transactions on image processing 16.1 (2007): 172-187.
[12]. Ma, Liying, and Khashayar Khorasani. "Facial expression recognition using constructive feedforward neural networks." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34.3 (2004): 1588-1595.
[13]. Zhu, Qiang, et al. "Fast human detection using a cascade of histograms of oriented gradients." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.
[14]. Turk, Matthew A., and Alex P. Pentland. "Face recognition using eigenfaces." Computer Vision and Pattern Recognition, 1991. Proceedings CVPR'91., IEEE Computer Society Conference on. IEEE, 1991.
[15]. LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[16]. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012