光学仪器  2023, Vol. 45 Issue (1): 8-17 PDF

Face image frontalization method for face expression analysis
ZHANG Xuedian, CHEN Zhongjun, QIN Xiaofei
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Abstract: In the process of face expression analysis, head pose changes often cause asymmetry in face information, and it is difficult to obtain features that are robust to pose by traditional operations related to cropping and aligning face images only. In order to obtain structured features of faces, a face image frontalization processing method is proposed in the paper. The method maps the detected face landmarks to a new two-dimensional space for frontalization of landmarks, then restores the frontalized landmarks to the original image as new landmarks, and guides the image deformation from the original landmarks to the new landmarks by moving least squares to obtain the frontalized face image. The face images are preprocessed on public RAF-DB and ExpW face expression datasets using the proposed processing method, models are trained in VGG16 and ResNet50 deep learning networks for face expression classification tasks. The effectiveness of the frontalization method in the paper for face expression analysis is evaluated by the accuracy of the classification tasks. The experimental results show that this proposed method outperforms traditional pre-processing methods of deep learning in face expression analysis and can effectively improve the quality of face information.
Key words: face expression analysis    preprocessing    face frontalization    face expression classification    deep learning

1 正脸化方法

 图 1 本文方法流程图 Figure 1 Flow chart of the method in this paper
1.1 关键点正脸化

1）正脸化的关键点信息需要保留原图人脸器官的特征；

2）对原图中受头部姿态遮挡的正脸关键点信息需要进行预测以保证人脸表情的准确性；

3）需要保证正脸化后的关键点信息位置与原关键点信息位置差距不会过大，否则在后续的变形过程中会导致较大的变形痕迹，影响图像的整体质量。

 图 2 人脸68点关键点+多加的3个点 Figure 2 68-point landmark in face + another 3 points

 ${{\rm{argmin}}}_{{\boldsymbol{X}}}{\left\| {{\boldsymbol{Y}}-{\boldsymbol{AX}}} \right\|}_{2}+\lambda {\left\| {{\boldsymbol{X}}} \right\|}_{2}$ (1)

 $\widehat{{\boldsymbol{X}}}={\left({{\boldsymbol{A}}}^{{\rm{T}}}{\boldsymbol{A}}+\lambda {\boldsymbol{I}}\right)}^{-1}{{\boldsymbol{A}}}^{{\rm{T}}}{\boldsymbol{Y}}$ (2)

 ${\widehat{{\boldsymbol{P}}}}_{i}=\left(\frac{{\left\| {{{\boldsymbol{P}}}_{i}-\stackrel-{{\boldsymbol{P}}}} \right\|}_{{\rm{F}}}}{\sqrt{n}}\right)\cdot \left({{\boldsymbol{P}}}_{{i}}-\stackrel-{{\boldsymbol{P}}}\right)\cdot {{\boldsymbol{R}}}_{i}$ (3)

 $\begin{split} {{\boldsymbol{R}}}_{i}=\left[\begin{array}{cc}\mathrm{cos}\;\alpha & -\mathrm{sin}\;\alpha \\ \mathrm{sin}\;\alpha& \mathrm{cos}\;\alpha \end{array}\right]\text{，}\alpha =\\ \mathrm{arctan}\left(\frac{{x}_{{\rm{Reye}}}-{x}_{{\rm{Leye}}}}{{y}_{{\rm{Reye}}}-{y}_{{\rm{Reye}}}}\right)\quad\qquad \end{split}$ (4)

 ${{\boldsymbol{P}}}_{{\rm{f}}}=\widehat{{\boldsymbol{P}}}\cdot \widehat{{\boldsymbol{X}}}$ (5)

 图 3 关键点正脸化 Figure 3 Landmark frontalization

 图 4 不同方法变形后平均脸的变化 Figure 4 The change of average faces under different methods
1.2 人脸粗对齐

 ${{\boldsymbol{P}}\mathrm{{'}} _{{\rm{f}}}}=\left(\frac{\sqrt{n}}{{\left\| {{{\boldsymbol{P}}}_{i}-\stackrel-{{\boldsymbol{P}}}} \right\|}_{{\rm{F}}}}\right)\cdot \left({{\boldsymbol{P}}}_{{\rm{f}}}+\stackrel-{{\boldsymbol{P}}}\right)$ (6)

 ${{\rm{argmin}}}_{\boldsymbol{S},\boldsymbol{R},\boldsymbol{T}}{\left\| {\boldsymbol{S}\cdot \boldsymbol{R}\cdot {\boldsymbol{P}}\mathrm{'}+\boldsymbol{T}-{{\boldsymbol{P}}\mathrm{{'}} _{{\rm{f}}}}} \right\|}^{2}$ (7)

 图 5 人脸粗对齐 Figure 5 Face rough alignment
1.3 人脸变形

${{\boldsymbol{p}}}_{i}$ 为控制点，即粗对齐后的关键点坐标信息 ${{\boldsymbol{P}}\mathrm{{'}} _{{\rm{a}}}}$ ${{\boldsymbol{q}}}_{i}$ 为控制点的变形位置，即正脸化后还原的关键点坐标 ${{\boldsymbol{P}}\mathrm{{'}} _{{\rm{f}}}}$ ，使用移动最小二乘法构建变形函数 ${l}_{\nu }\left(\boldsymbol{x}\right)$ 。对图像中的 ${\boldsymbol{p}}_{i}$ 每个像素点v的坐标有

 ${l}_{\nu }\left(\boldsymbol{x}\right)=({\boldsymbol{x}}-{\boldsymbol{p}}\mathrm{*}){\boldsymbol{M}}+{\boldsymbol{q}}\mathrm{*}$ (8)

${\boldsymbol{p}}\mathrm{*}$ ${\boldsymbol{q}}\mathrm{*}$ 计算公式如下

 ${\boldsymbol{p}}\mathrm{*}=\dfrac{{\displaystyle \sum }_{i}{w}_{i}{{\boldsymbol{p}}}_{i}}{{\displaystyle \sum }_{i}{w}_{i}},\;{\boldsymbol{q}}\mathrm{*}=\dfrac{{\displaystyle \sum }_{i}{w}_{i}{{\boldsymbol{q}}}_{i}}{{\displaystyle \sum }_{i}{w}_{i}}$ (9)

 ${w}_{i}=\frac{1}{{\left|{{\boldsymbol{p}}}_{i}-{\boldsymbol{v}}\right|}^{2\alpha }}$ (10)

 ${\boldsymbol{M}}={\left(\sum _{i}{\widehat{{\boldsymbol{p}}}}_{i}^{{\rm{T}}}{w}_{i}{\widehat{{\boldsymbol{p}}}}_{i}\right)}^{-1}\sum _{j}{\widehat{{\boldsymbol{p}}}}_{j}^{{\rm{T}}}{\widehat{{\boldsymbol{q}}}}_{j}$ (11)

${\widehat{{\boldsymbol{p}}}}_{j}$ ${\widehat{{\boldsymbol{q}}}}_{j}$ 由下列公式求得：

 ${\widehat{{\boldsymbol{p}}}}_{j}={{\boldsymbol{p}}}_{i}-{\boldsymbol{p}}*\text{，}{\widehat{{\boldsymbol{q}}}}_{j}={{\boldsymbol{q}}}_{i}-{\boldsymbol{q}}*$ (12)

 图 6 人脸变形 Figure 6 Face deformation
1.4 人脸信息处理

 $a=\frac{{\rm{min}}\left(d\right({\widehat{P}}_{i}^{2},{\widehat{P}}_{i}^{31}),d({\widehat{P}}_{i}^{31},{\widehat{P}}_{i}^{16}\left)\right)}{{\rm{max}}\left(d\right({\widehat{P}}_{i}^{2},{\widehat{P}}_{i}^{31}),d({\widehat{P}}_{i}^{31},{\widehat{P}}_{i}^{16}\left)\right)}$ (13)

 图 7 人脸信息处理 Figure 7 Face information processing

 ${I}_{{\rm{new}}}={I}_{1}\alpha +{I}_{2}\left(1-\alpha \right)$ (14)

2 实　验

2.1 数据集

RAF-DB[16]数据集共有29672张面部图像，数据来源于互联网，包含成千上万张人脸，15339张图片由人工标注，共有7种表情，即惊讶、恐惧、厌恶、开心、悲伤、愤怒和中性表情。

ExpW[17]数据集共有91793张面部图像，数据来源于谷歌图片，图片被人工标注为7种表情，包括愤怒、厌恶、恐惧、开心、悲伤、惊讶和中性表情。但是该数据集中并不是每张图片都来自现实的人脸，会包含一些动漫人物或是与人脸相似的物体图片，并且标注的表情略带歧义，所以该数据集的基准准确率较RAF-DB会低很多。

2.2 实验设计

2.3 实验结果

 图 8 第1批实验 Figure 8 Batch 1 experiments

 图 9 第2批实验 Figure 9 Batch 2 experiments

 图 10 第3批实验 Figure 10 Batch 3 experiments

 图 11 第4批实验 Figure 11 Batch 4 experiments

 图 12 第5批实验 Figure 12 Batch 5 experiments

 图 13 综合场景下各种操作最高准确率排序 Figure 13 Ranking of the highest accuracy of various operations in a combined scenario

3 结　论

 [1] BANERJEE S, BROGAN J, KRIZAJ J, et al. To frontalize or not to frontalize: do we really need elaborate pre-processing to improve face recognition[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision . Lake Tahoe: IEEE, 2018: 20 − 29. [2] VONIKAKIS V, WINKLER S. Identity-invariant facial landmark frontalization for facial expression analysis[C]//Proceedings of 2020 IEEE International Conference on Image Processing . Abu Dhabi: IEEE, 2020: 2281 − 2285. [3] HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4295 − 4304. [4] ZHOU H, LIU J H, LIU Z W, et al. Rotate-and-render: unsupervised photorealistic face rotation from single-view images[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5910 − 5919. [5] NING X, NAN F Z, XU S H, et al. Multi-view frontal face image generation: a survey[J]. Concurrency and Computation: Practice and Experience, 2020: e6147. [6] CORNEANU C A, SIMÓN M O, COHN J F, et al. Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1548–1568. DOI:10.1109/TPAMI.2016.2515606 [7] KAZEMI V, SULLIVAN J. One millisecond face alignment with an ensemble of regression trees[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1867 − 1874. [8] LANGNER O, DOTSCH R, BIJLSTRA G, et al. Presentation and validation of the Radboud faces database[J]. Cognition and Emotion, 2010, 24(8): 1377–1388. DOI:10.1080/02699930903485076 [9] GOELEVEN E, DE RAEDT R, LEYMAN L, et al. The Karolinska directed emotional faces: a validation study[J]. Cognition and Emotion, 2008, 22(6): 1094–1118. DOI:10.1080/02699930701626582 [10] GAO W, CAO B, SHAN S G, et al. The CAS-PEAL large-scale Chinese face database and baseline evaluations[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2008, 38(1): 149–161. DOI:10.1109/TSMCA.2007.909557 [11] SIM T, BAKER S, BSAT M. The CMU pose, illumination, and expression (PIE) database[C]//Proceedings of the Fifth IEEE International Conference on Automatic Face Gesture Recognition. Washington: IEEE, 2002: 53 − 58. [12] SCHAEFER S, MCPHAIL T, WARREN J. Image deformation using moving least squares[C]//Proceedings of ACM SIGGRAPH 2006. Boston: ACM, 2006: 533 − 540. [13] VONIKAKIS V, WINKLER S. A center-surround framework for spatial image processing[J]. Electronic Imaging, 2016, 28(6): art00005. [14] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations. San Diego: ICLR, 2015. [15] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770 − 778. [16] LI S, DENG W H. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition[J]. IEEE Transactions on Image Processing, 2019, 28(1): 356–370. DOI:10.1109/TIP.2018.2868382 [17] ZHANG Z P, LUO P, LOY C C, et al. From facial expression recognition to interpersonal relation prediction[J]. International Journal of Computer Vision, 2018, 126(5): 550–569. DOI:10.1007/s11263-017-1055-1