CN112528902B

CN112528902B - Video monitoring dynamic face recognition method and device based on 3D face model

Info

Publication number: CN112528902B
Application number: CN202011501892.5A
Authority: CN
Inventors: 游志胜; 傅可人; 程鹏
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2022-05-24
Anticipated expiration: 2040-12-17
Also published as: CN112528902A

Abstract

The invention discloses a video monitoring dynamic face recognition method and a video monitoring dynamic face recognition device based on a 3D face model, wherein the method comprises the following steps: extracting two-dimensional features of a two-dimensional face image to be recognized, which are acquired by an acquisition end, converting the two-dimensional features into a three-dimensional face image, extracting three-dimensional features of the three-dimensional face image obtained by conversion, and connecting the three-dimensional features in series to obtain a first fusion feature containing two-dimensional information and three-dimensional information; simultaneously, extracting three-dimensional information of a three-dimensional face model prestored in the identification terminal, projecting the prestored three-dimensional face model to a two-dimensional projection image, extracting two-dimensional features of the projection image, connecting in series to obtain a second fusion feature containing the two-dimensional information and the three-dimensional information, and finally performing face identification by using the two fusion features; the corresponding fusion characteristics fully fuse the three-dimensional shape information on the basis of the two-dimensional texture information, the problem that the identification cannot be successfully identified in a complex environment only by using the two-dimensional texture information is solved, the identification accuracy is effectively improved, and the robustness of an identification algorithm is ensured.

Description

A method and device for dynamic face recognition in video surveillance based on 3D face model

技术领域technical field

本发明涉及计算机视觉与模式识别技术领域，特别涉及一种基于3D人脸模型的视频监控动态人脸识别方法及装置。The invention relates to the technical field of computer vision and pattern recognition, in particular to a dynamic face recognition method and device for video surveillance based on a 3D face model.

背景技术Background technique

人脸识别技术已经成为新一代人工智能研究开发和应用的热点和亮点。得益于大数据、深度学习等新一代人工智能技术的迅速发展，基于二维(2D)图片的人脸识别技术在环境可控、用户配合的多个应用领域如安全检查、金融等取得了重要应用，产生了巨大的社会与经济效益。然而，对于更广泛的环境��可控、用户��配合的��态��景下的应用如视频监控动态人脸识别，现有人脸识别技术的性能依然远未满足应用需求。基于三维(3D)人脸模型的人脸识别技术是未来发展的趋势之一，3D人脸模型比2D人脸图片拥有诸如三维形状等更加丰富的信息，可提高环境不可控、用户非配合如大姿态、多光照变化等动态情况下的识别性能。然而对于注册端和识别端都采用3D传感器捕获3D人脸，将目前社会上所有的2D摄像头都改造成3D传感器则是不能短时间内实现。一种实用方案是注册端采集3D人脸模型，而识别端采集一张或多张2D人脸图片进行识别，即利用三维人脸模型识别二维人脸图片的相关技术。Face recognition technology has become a hot spot and a bright spot in the research, development and application of a new generation of artificial intelligence. Thanks to the rapid development of new-generation artificial intelligence technologies such as big data and deep learning, face recognition technology based on two-dimensional (2D) pictures has achieved great success in many application fields with controllable environment and user cooperation, such as security inspection and finance. Important applications have produced huge social and economic benefits. However, for a wider range of applications in dynamic scenarios where the environment is uncontrollable and users are not cooperative, such as video surveillance dynamic face recognition, the performance of the existing face recognition technology is still far from meeting the application requirements. Face recognition technology based on three-dimensional (3D) face model is one of the future development trends. 3D face model has more abundant information such as three-dimensional shape than 2D face image, which can improve the environment such as uncontrollable environment and user non-cooperation. Recognition performance in dynamic situations such as large poses and multiple illumination changes. However, both the registration terminal and the recognition terminal use 3D sensors to capture 3D faces, and it cannot be achieved in a short time to transform all 2D cameras in the current society into 3D sensors. A practical solution is that the registration terminal collects a 3D face model, and the recognition terminal collects one or more 2D face pictures for identification, that is, a related technology of using a three-dimensional face model to identify two-dimensional face pictures.

目前涉及利用三维人脸模型识别二维人脸图像的技术较为匮乏，多数还停留在利用三维人脸模型识别三维人脸的阶段，而非利用三维人脸模型去识别二维人脸图像。申请公布号为CN108427871A的中国发明专利申请公开了一种3D人脸快速身份认证方法和装置，其将三维人脸模型旋转到待识别二维图像相同的姿态并投影到二维图像，再将投影后的二维图像与待识别的二维图像进行比对识别。申请公布号为CN109858433A的中国发明专利申请公开了一种基于三维人脸模型识别二维人脸图片的方法及装置，其将三维人脸模型与单张二维图片通过某一准则都投影到多个姿态再分别进行匹配识别。然而，以上方法存在只将三维人脸模型投影到二维图像，利用三维信息进行二维图片的对齐，最后利用对齐后的二维纹理信息进行人脸识别，但是其在特征对比过程中并没有直接利用三维人脸蕴含的三维形状信息进行人脸识别。At present, the technologies involved in using 3D face models to recognize 2D face images are relatively scarce, and most of them are still at the stage of using 3D face models to recognize 3D faces, rather than using 3D face models to recognize 2D face images. The Chinese invention patent application with the application publication number CN108427871A discloses a 3D face fast identity authentication method and device, which rotates the three-dimensional face model to the same posture as the two-dimensional image to be recognized and projects it to the two-dimensional image, and then projects the projection onto the two-dimensional image. The latter two-dimensional image is compared and identified with the two-dimensional image to be identified. The Chinese invention patent application with the application publication number CN109858433A discloses a method and device for recognizing a two-dimensional face picture based on a three-dimensional face model, which projects the three-dimensional face model and a single two-dimensional picture to multiple poses through a certain criterion Then carry out matching identification respectively. However, the above methods only project the 3D face model to the 2D image, use the 3D information to align the 2D images, and finally use the aligned 2D texture information to perform face recognition, but there is no feature comparison process. The 3D shape information contained in the 3D face is directly used for face recognition.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有利用三维人脸模型识别二维人脸图像的技术所存在的未利用三维形状信息进行人脸识别的缺陷，提供一种基于3D人脸模型的视频监控动态人脸识别方法及装置，在提取待识别二维图像的二维信息、预存的三维人脸模型的三维信息对应的二维投影信息的基础上，通过将采集端采集到的待识别二维图像转换为三维人脸模型，并对三维人脸模型做UV转换，得到三维信息，同时利用UV转换提取识别端预存的三维人脸模型的三维信息，再利用二维信息、三维信息的融合特征进行特征对比、人脸识别；通过联合二维纹理信息与三维形状信息进行人脸识别，改善只使用二维投影图像(只利用二维纹理信息)进行识别所存在的无法在复��环境下识别成功的问题，有效提高识别的准确率。The object of the present invention is to overcome the defect of not using three-dimensional shape information for face recognition existing in the existing technology of using three-dimensional face model to identify two-dimensional face images, and to provide a video monitoring dynamic face based on 3D face model. The identification method and device, on the basis of extracting the two-dimensional information of the two-dimensional image to be identified and the two-dimensional projection information corresponding to the three-dimensional information of the pre-stored three-dimensional face model, convert the two-dimensional image to be identified collected by the collection end into 3D face model, and perform UV conversion on the 3D face model to obtain 3D information. At the same time, UV conversion is used to extract the 3D information of the 3D face model pre-stored at the recognition terminal, and then the fusion features of 2D information and 3D information are used for feature comparison. , face recognition; face recognition by combining two-dimensional texture information and three-dimensional shape information, to improve the problem that only using two-dimensional projection images (only using two-dimensional texture information) for recognition cannot be successfully recognized in complex environments, Effectively improve the accuracy of recognition.

为了实现上述发明目的，本发明提供了以下技术方案：In order to achieve the above-mentioned purpose of the invention, the present invention provides the following technical solutions:

一种基于3D人脸模型的视频监控动态人脸识别方法，包括：A dynamic face recognition method for video surveillance based on a 3D face model, comprising:

A、提取N张待识别二维人脸图像；利用第一特征提取器对N张所述待识别二维人脸图像进行特征提取，得到第一二维特征向量；其中，N为整数且N≥1；A. Extract N two-dimensional face images to be identified; use the first feature extractor to perform feature extraction on N described two-dimensional face images to be identified to obtain a first two-dimensional feature vector; wherein, N is an integer and N ≥1;

以及，将N张所述待识别二维人脸图像转换为三维人脸模型，对所得三维人脸模型进行UV展开，得到第一UV图，利用第二特征提取器对第一UV图进行特征提取，得到第一三维特征向量；串联所述第一二维特征向量与第一三维特征向量，得到第一融合特征向量；And, converting the N two-dimensional face images to be identified into a three-dimensional face model, carrying out UV expansion on the obtained three-dimensional face model, obtaining a first UV map, and utilizing a second feature extractor to characterize the first UV map. extracting to obtain a first three-dimensional feature vector; connecting the first two-dimensional feature vector and the first three-dimensional feature vector to obtain a first fusion feature vector;

B、将预存的三维人脸模型分别投影至N张所述待识别二维人脸图像对应的视角，得到N张二维投影人脸图像，利用第一特征提取器对N张所述二维投影人脸图像进行特征提取，得到第二二维特征向量；B. Project the pre-stored three-dimensional face model to the corresponding perspective of the N two-dimensional face images to be recognized, to obtain N two-dimensional projected face images, and use the first feature extractor to perform the N two-dimensional projected face images. Perform feature extraction on the face image to obtain a second two-dimensional feature vector;

以及，对预存的三维人脸模型进行UV展开，得到第二UV图，利用第二特征提取器对第二UV图进行特征提取，得到第二三维特征向量；串联所述第二二维特征向量与第二三维特征向量，得到第二融合特征向量；and, performing UV expansion on the pre-stored three-dimensional face model to obtain a second UV image, and using a second feature extractor to perform feature extraction on the second UV image to obtain a second three-dimensional feature vector; the second two-dimensional feature vector is connected in series and the second three-dimensional feature vector to obtain the second fusion feature vector;

C、将所述第一融合三维特征向量与所述第二融合三维特征向量进行特征对��，得到��脸识别��。C. Perform feature comparison between the first fused 3D feature vector and the second fused 3D feature vector to obtain a face recognition result.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，所述提取N张待识别二维人脸图像包括：对视频监控中人脸进行跟踪识别，得到人脸视频流，基于预设的筛选条件从所述人脸视频流中选出N张所述待识别二维人脸图像。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on a 3D face model, the extracting N two-dimensional face images to be recognized includes: tracking and recognizing the faces in the video surveillance, obtaining a face video stream, based on The preset screening condition selects N pieces of the to-be-recognized two-dimensional face images from the face video stream.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，当N＞1时，将提取到的N张所述待识别二维人脸图像的N个特征向量进行串联得到所述第一二维特征向量；Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on 3D face model, when N>1, the extracted N feature vectors of the two-dimensional face images to be recognized are connected in series to obtain the the first two-dimensional feature vector;

当N＞1时，将提取到的N张所述二维投影人脸图像N个特征向量进行串联得到所述第二二维特征向量。When N>1, the second two-dimensional feature vector is obtained by concatenating the extracted N feature vectors of the two-dimensional projected face images.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，所述步骤C还包括：利用多层感知器对所述第一融合特征向量、所述第二融合特征向量分别进行特征变换与降维处理，以对将降维处理后的第二融合特征向量与第一融合特征向量进行特征对比，得到人脸识别结果。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on a 3D face model, the step C further comprises: using a multi-layer perceptron to characterize the first fusion feature vector and the second fusion feature vector respectively Transformation and dimensionality reduction processing are performed to compare the features of the second fusion feature vector after dimensionality reduction processing with the first fusion feature vector to obtain a face recognition result.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，通过计算降维处理后的第二融合特征向量与第一融合特征向量的余弦相似度或者欧式距离来进行特征对比。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on a 3D face model, feature comparison is performed by calculating the cosine similarity or Euclidean distance between the second fusion feature vector after dimensionality reduction processing and the first fusion feature vector.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，所述UV转换包括：将三维人脸模型的形状信息转换为UV位置图和UV法向量图。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on a 3D face model, the UV conversion includes: converting the shape information of the 3D face model into a UV position map and a UV normal vector map.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，所述UV位置图上像素的值与所述三维人脸模型上每个三维点的坐标一一对应，UV法向量图上像素的值与所述三维人脸模型上每个三维点的法向量一一对应。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on 3D face model, the value of the pixel on the UV position map corresponds to the coordinates of each 3D point on the 3D face model, and the UV normal vector map The value of the upper pixel corresponds to the normal vector of each 3D point on the 3D face model.

优选的，上述基于3D人脸模型的视频监控动态人脸识别方法中，所述第一特征提取器、所述第二特征提取器为卷积神经网络VGG-16、ResNet-50或ResNet-101中的一种。Preferably, in the above-mentioned dynamic face recognition method for video surveillance based on 3D face model, the first feature extractor and the second feature extractor are convolutional neural networks VGG-16, ResNet-50 or ResNet-101 one of the.

在本发明进一步的实施例中，还提供一种基于3D人脸模型的视频监控动态人脸识别装置，包括至少一个处理器，以及与所述至少一个处理器通信连接的存储器；所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述的基于3D人脸模型的视频监控动态人脸识别方法。In a further embodiment of the present invention, there is also provided a dynamic face recognition device for video surveillance based on a 3D face model, comprising at least one processor and a memory communicatively connected to the at least one processor; the memory stores There are instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned dynamic face recognition method for video surveillance based on a 3D face model .

与现有技术相比，本发明的有益效果：Compared with the prior art, the beneficial effects of the present invention:

本发明所提供的人脸识别方法通过提取采集端采集到的待识别二维图像的二维特征，并将待识别图像转换为三维人脸模型，提取转换得到的三维人脸模型的三维特征，将二者进行串联得到包含二维信息与三维信息的第一融合特征；同时，提取识别端预存的三维人脸模型的三维信息，并将预存三维人脸模型投影至二维投影图像，提取投影图像的二维特征，将二者进行串联得到包含二维信息与三维信息的第二融合特征，最后可直接利用融合了二维信息与三维信息的特征进行人脸识别；相应的融合特征在二维纹理信息基础上充分融合了三维形状信息，改善只使用二维投影图像(只利用二维纹理信息)进行识别所存在的无法在复杂环境下识别成功的问题，有效提高识别的准确率、保证识别算法的鲁棒性。The face recognition method provided by the present invention extracts the two-dimensional features of the two-dimensional image to be recognized collected by the collection end, converts the to-be-recognized image into a three-dimensional face model, and extracts the three-dimensional features of the converted three-dimensional face model, The two are connected in series to obtain the first fusion feature containing two-dimensional information and three-dimensional information; at the same time, the three-dimensional information of the three-dimensional face model pre-stored at the recognition end is extracted, and the pre-stored three-dimensional face model is projected to the two-dimensional projection image, and the projection is extracted. The two-dimensional features of the image are connected in series to obtain the second fusion feature containing the two-dimensional information and the three-dimensional information. Finally, the features that integrate the two-dimensional information and the three-dimensional information can be directly used for face recognition; On the basis of 3D texture information, the 3D shape information is fully integrated, which improves the problem that only 2D projection images (using only 2D texture information) can not be successfully recognized in complex environments, and effectively improves the accuracy of recognition and guarantees Robustness of the recognition algorithm.

在利用多张同一个人的视频监控人脸图像进行人脸跟踪识别的场景中，相较于只利用二维纹理特征的识别方法，本发明所计算的融合特征具有更强的鲁棒性，人脸识别效果更好，相较于现有识别方法适用范围更广，适用场景更具有通用性。In the scene of using multiple video surveillance face images of the same person for face tracking and identification, compared with the identification method using only two-dimensional texture features, the fusion feature calculated by the present invention has stronger robustness, and the human Compared with the existing recognition methods, the face recognition effect is better, and the applicable scenarios are more versatile.

附图说明：Description of drawings:

图1为根据本发明示例性实施例1中的基于3D人脸模型的视频监控动态人脸识别方法流程图。FIG. 1 is a flowchart of a method for dynamic face recognition in video surveillance based on a 3D face model according to an exemplary embodiment 1 of the present invention.

图2为根据本发明示例性实施例2中的基于3D人脸模型的视频监控动态人脸识别方法原理框图。FIG. 2 is a schematic block diagram of a method for dynamic face recognition in video surveillance based on a 3D face model according to an exemplary embodiment 2 of the present invention.

图3为根据本发明示例性实施例的基于3D人脸模型的视频监控动态人脸识别装置结构图。FIG. 3 is a structural diagram of a dynamic face recognition device for video surveillance based on a 3D face model according to an exemplary embodiment of the present invention.

具体实施方式Detailed ways

下面结合试验例及具体实施方式对本发明作进一步的详细描述。但不应将此理解为本发明上述主题的范围仅限于以下的实施例，凡基于本发明内容所实现的技术均属于本发明的范围。The present invention will be further described in detail below in conjunction with test examples and specific embodiments. However, it should not be construed that the scope of the above-mentioned subject matter of the present invention is limited to the following embodiments, and all technologies realized based on the content of the present invention belong to the scope of the present invention.

实施例1Example 1

图1示出了根据本发明示例性实施例的基于3D人脸模型的视频监控动态人脸识别方法，包括：1 shows a dynamic face recognition method for video surveillance based on a 3D face model according to an exemplary embodiment of the present invention, including:

C、将所述第一融合三维特征向量与所述第二融合三维特征向量进行特征对比，得到人脸识别结果。C. Perform feature comparison between the first fused 3D feature vector and the second fused 3D feature vector to obtain a face recognition result.

具体的，以选取3个视角的待识别二维人脸图像对本发明利用二维、三维融合特征进行人脸识别的方法进行说明，如图2所示，本方法包括：Specifically, the method of the present invention for performing face recognition using two-dimensional and three-dimensional fusion features will be described by selecting two-dimensional face images from three perspectives to be recognized. As shown in Figure 2, the method includes:

S1：对视频监控中人脸进行跟踪，对待识别的某人挑选特定数量(N张，N为≥1的整数)代表性二维人脸图像用于后续识别。S1: Track faces in video surveillance, and select a specific number (N, N is an integer ≥ 1) representative two-dimensional face images for the person to be identified for subsequent identification.

具体地，人脸检测及跟踪为本领域技术人员熟知的方法，��有��类技术方法有许多，可对监控视频中的人脸进行跟踪得到每个人脸的连续序列。在得到某个待识别人脸的连续序列后，本实施例通过算法对人脸分辨率、图像质量、姿态角等进行综合分析后自动选取固定特定数量(N张)代表性二维人脸图像用于后续识别。本实施例中，所选取的代表性二维人脸图像尽可能的代表人脸在不同角度和姿态下的特性。在本发明中N为≥1的整数，N＝1即表示单视，N>1表示多视，一般只有多视角二维图像才能重建出比较好的三维人脸模型，因此优选N>1，但是如果选取的单视图像质量能够满足三维重建的要求，那么也可以选择单视即N＝1。而本实施例以N＝3进行说明。Specifically, face detection and tracking are methods well known to those skilled in the art, and there are many such technical methods, which can track the faces in the surveillance video to obtain a continuous sequence of each face. After obtaining a continuous sequence of faces to be recognized, this embodiment automatically selects a fixed specific number (N) of representative two-dimensional face images after comprehensive analysis of the resolution, image quality, attitude angle, etc. of the face through an algorithm for subsequent identification. In this embodiment, the selected representative two-dimensional face image represents the characteristics of the face under different angles and postures as much as possible. In the present invention, N is an integer greater than or equal to 1, N=1 means single view, N>1 means multi-view, generally only a multi-view two-dimensional image can reconstruct a better three-dimensional face model, so N>1 is preferred, However, if the quality of the selected single-view image can meet the requirements of three-dimensional reconstruction, then a single-view image can also be selected, that is, N=1. In this embodiment, N=3 is used for description.

S2：利用神经网络提取所述N张二维人脸图像的二维特征向量，再进行特征串联，得到第一二维特征向量；S2: using a neural network to extract the two-dimensional feature vectors of the N two-dimensional face images, and then performing feature series to obtain a first two-dimensional feature vector;

具体的，对每张待识别人脸图像(共3张)，分别用第一特征提取器即卷积神经网络F提取人脸特征向量，再将分别提取的人脸特征向量进行串联。同样，该步骤最终得到的向量为256*3＝768维。F为训练好的用于人脸特征提取的神经网络，可以为VGG-16，ResNet-50，ResNet-101等常见的用于人脸特征提取的网络结构。Specifically, for each face image to be recognized (3 images in total), a first feature extractor, that is, a convolutional neural network F, is used to extract a face feature vector, and then the separately extracted face feature vectors are connected in series. Likewise, the vector finally obtained in this step is 256*3=768 dimensions. F is a trained neural network for face feature extraction, which can be a common network structure for face feature extraction such as VGG-16, ResNet-50, and ResNet-101.

S3：利用单张或多张二维图像生成三维人脸模型的方法，从单视或多视二维人脸图像中生成三维人脸模型，对于该生成的三维人脸模型进行UV转换，转换为UV位置图和UV法向量图。S3: A method of generating a 3D face model by using a single or multiple 2D images, generating a 3D face model from a single-view or multi-view 2D face image, and performing UV conversion on the generated 3D face model to convert to UV Position map and UV normal map.

具体的，从单张或多张二维人脸图像生成三维人脸模型，为本领域技术人员熟知的方法。此类方法有许多，如文献《3D Face Reconstruction Using a Single orMultiple Views》、《Examplar coherent 3D face reconstruction from forensicmugshot database》、《Automated 3D Face Reconstruction from Multiple Imagesusing Quality Measures》、《Fast,Approximate 3D Face Reconstruction fromMultiple Views》等提出的方法都可以使用。对于该生成的三维人脸模型，将三维人脸模型通过文献《Deep 3D Facial Landmark Localization on position maps》中所述的方法转换为UV位置图(UV position map)，UV位置图上的每个点的RGB颜色值代表对应三维人脸模型上三维点的归一化(X,Y,Z)坐标，三维点与像素坐标为一一对应的关系。同理,计算每个三维点的法向量(NX,NY,NZ)，以类似的方式转换为UV法向量图。即将生成的三维人脸模型映射到UV坐标系(或称为将人脸三维人脸模型进行UV展开)，转换为第一UV图，包括：第一UV位置图和第一UV法向量图。Specifically, generating a three-dimensional face model from a single or multiple two-dimensional face images is a method well known to those skilled in the art. There are many such methods, such as "3D Face Reconstruction Using a Single orMultiple Views", "Examplar coherent 3D face reconstruction from forensicmugshot database", "Automated 3D Face Reconstruction from Multiple Images using Quality Measures", "Fast,Approximate 3D Face Reconstruction fromMultiple" Views" and other proposed methods can be used. For the generated 3D face model, the 3D face model is converted into a UV position map by the method described in the document "Deep 3D Facial Landmark Localization on position maps", and each point on the UV position map is converted into a UV position map. The RGB color value represents the normalized (X, Y, Z) coordinates of the 3D point on the corresponding 3D face model, and the 3D point and pixel coordinates are in a one-to-one correspondence. In the same way, the normal vector (NX, NY, NZ) of each 3D point is calculated and converted to a UV normal vector map in a similar way. The generated 3D face model is mapped to the UV coordinate system (or called UV unwrapping of the 3D face model of the face), and converted into a first UV map, including: a first UV position map and a first UV normal vector map.

S4：将从S3得到的UV位置图和UV法向量图在通道上串联得到第一UV图，利用神经网络提取第一UV图的第一三维特征向量。S4: The UV position map and the UV normal vector map obtained from S3 are connected in series on the channel to obtain a first UV map, and a neural network is used to extract the first three-dimensional feature vector of the first UV map.

具体的，利用第二特征提取器提取第一UV图中的第一三维特征向量。所用第二特征提取器为神经网络G，G为训练好的用于从UV图提取三维人脸特征向量的神经网络，可以为VGG-16，ResNet-50,ResNet-101等常见的用于人脸特征提取的网络结构,本实施例中提取向量为512维。其中，第一特征提取器与第二特征提取器在网络架构是保持一致，同为VGG-16，ResNet-50或ResNet-101中的一种。最后将S2得到的第一二维特征向量和S4得到的第一三维特征向量在通道上进行串联得到第一融合特征向量。Specifically, the second feature extractor is used to extract the first three-dimensional feature vector in the first UV map. The second feature extractor used is a neural network G, G is a trained neural network for extracting three-dimensional face feature vectors from UV images, which can be VGG-16, ResNet-50, ResNet-101, etc. The network structure of face feature extraction, the extraction vector in this embodiment is 512 dimensions. Among them, the first feature extractor and the second feature extractor are consistent in the network architecture, and they are one of VGG-16, ResNet-50 or ResNet-101. Finally, the first two-dimensional feature vector obtained by S2 and the first three-dimensional feature vector obtained by S4 are connected in series on the channel to obtain the first fusion feature vector.

同时，对识别端预存的三维人脸模型进行特征提取，包括：S5，将预存的三维人脸模型投影到所述N张人脸图像相应的姿态，并利用神经网络提取每张投影图片的特征向量，再进一步进行特征串联，得到第二二维特征向量At the same time, feature extraction is performed on the 3D face model pre-stored at the recognition end, including: S5, projecting the pre-stored 3D face model to the corresponding poses of the N face images, and using a neural network to extract the features of each projected image vector, and then further feature concatenation to obtain the second two-dimensional feature vector

具体地，待识别N张人脸图像是二维图像，而识别库里面是三维人脸模型，本实施例中的人脸识别问题是回答待识别图像与属于库里面的某个三维人脸模型是否匹配，并得到匹配分数，即用三维人脸模型识别单张或多张二维人脸图像。首先估计所述待识别人脸图像中人脸的姿态角

(这里采用文献《Fine-Grained Head Pose EstimationWithout Keypoints》提出的方法估计人脸的姿态角)，其中ω表示偏航角(yaw)，θ表示翻滚角(roll)，

表示俯仰角(pitch)。本实施例中N＝3，则用该方法估计每一张二维人脸图像的姿态角。在获得每张图片的姿态角后，将三维人脸模型分别投影至所述每张待识别人脸图像相应的姿态角，三维人脸模型投影��二维图像为本领域技术人员熟知的方法。��后，每一张投影��分别通过同一卷积神经网络F提取人脸特征向量，本实施例中所提取向量维256维。因本实施例中N＝3，则将分别提取的人脸特征向量进行串联，最终得到的第二二维特征向量为256*3＝768维。Specifically, the N face images to be recognized are two-dimensional images, and the recognition library contains three-dimensional face models. The face recognition problem in this embodiment is to answer the image to be recognized and a three-dimensional face model belonging to the library. Whether it is matched or not, and get the matching score, that is, use the three-dimensional face model to identify single or multiple two-dimensional face images. First, estimate the pose angle of the face in the face image to be recognized

(Here, the method proposed in the document "Fine-Grained Head Pose EstimationWithout Keypoints" is used to estimate the pose angle of the face), where ω represents the yaw angle (yaw), θ represents the roll angle (roll),

Indicates the pitch angle (pitch). In this embodiment, N=3, and this method is used to estimate the pose angle of each two-dimensional face image. After obtaining the attitude angle of each picture, the three-dimensional face model is projected to the corresponding attitude angle of each face image to be recognized, and the method of projecting the three-dimensional face model to the two-dimensional image is well known to those skilled in the art. After that, each projection image is respectively extracted through the same convolutional neural network F to extract the face feature vector, and the dimension of the extracted vector is 256 in this embodiment. Since N=3 in this embodiment, the face feature vectors extracted respectively are connected in series, and the second two-dimensional feature vector finally obtained is 256*3=768 dimensions.

S6：将预存的三维人脸模型的形状信息转换为UV位置图和UV法向量图，UV位置图上像素的值为对应三维人脸模型上三维点的(X,Y,Z)坐标，UV法向量图上像素的值为对应三维人脸模型上三维点的法向量(NX,NY,NZ)。S6: Convert the shape information of the pre-stored 3D face model into a UV position map and a UV normal vector map. The value of the pixel on the UV position map corresponds to the (X, Y, Z) coordinates of the 3D point on the 3D face model, and the UV The value of the pixel on the normal vector map corresponds to the normal vector (NX, NY, NZ) of the 3D point on the 3D face model.

具体的，将三维人脸模型通过文献《Deep 3D Facial Landmark Localization onposition maps》中所述的方法转换为UV位置图(UV position map)，UV位置图上的每个点的RGB颜色值代表对应三维人脸模型上三维点的归一化(X,Y,Z)坐标。同理,计算每个三维点的法向量(NX，NY，NZ)，以类似的方式转换为UV法向量图。将人脸三维人脸模型映射到UV坐标系(或称为将人脸三维人脸模型进行UV展开)。Specifically, the three-dimensional face model is converted into a UV position map by the method described in the document "Deep 3D Facial Landmark Localization onposition maps", and the RGB color value of each point on the UV position map represents the corresponding three-dimensional Normalized (X, Y, Z) coordinates of 3D points on the face model. In the same way, the normal vector (NX, NY, NZ) of each 3D point is calculated and converted to a UV normal vector map in a similar way. The 3D face model of the face is mapped to the UV coordinate system (or called UV unwrapping of the 3D face model of the face).

S7：将从S6得到的UV位置图和UV法向量图在通道上串联，之后利用神经网络提取三维特征向量。S7: The UV position map and the UV normal vector map obtained from S6 are concatenated on the channel, and then the three-dimensional feature vector is extracted by the neural network.

具体地，利用前述的神经网络G从串联的UV位置图和UV法向量图提取第二三维人脸特征向量，本实施例中提取向量维512维。将第二二维特征向量与第二三维特征向量在通道上进行串联，得到第二融合特征向量。Specifically, the aforementioned neural network G is used to extract the second three-dimensional face feature vector from the UV position map and the UV normal vector map in series, and the vector dimension is 512 in this embodiment. The second two-dimensional feature vector and the second three-dimensional feature vector are concatenated on the channel to obtain the second fusion feature vector.

S8：将两个串联后的特征，分别通过共享的多层感知器(MLP)进行特征变换与降��，最终得到用于匹配识别的第一特征向量和第二特征向量。S8: Perform feature transformation and dimensionality reduction on the two concatenated features through a shared multi-layer perceptron (MLP) respectively, and finally obtain a first feature vector and a second feature vector for matching and identification.

具体地，将S2所述得到的768维二维特征和S4中所述得到的512维三维特征进行串联，得到1280维特征A。同时，将S5所述得到的768维二维特征和S7中所述得到的512维三维特征进行串联，得到1280维特征B。使用训练好的三层感知机M(其三层分别包含1024,1024,256个神经元)将1280维特征转换为最终的256维特征；亦即将A和B分别输入M，得到用于匹配识别的第一特征向量和第二特征向量。Specifically, the 768-dimensional two-dimensional feature obtained in S2 and the 512-dimensional three-dimensional feature obtained in S4 are connected in series to obtain a 1280-dimensional feature A. At the same time, the 768-dimensional two-dimensional feature obtained in S5 and the 512-dimensional three-dimensional feature obtained in S7 are connected in series to obtain a 1280-dimensional feature B. Use the trained three-layer perceptron M (the three layers respectively contain 1024, 1024, 256 neurons) to convert the 1280-dimensional features into the final 256-dimensional features; that is, input A and B into M respectively, and get them for matching and recognition. The first and second eigenvectors of .

S9：将根据三维人脸模型得到的第一特征向量和根据N张代表性二维人脸图像到的第二特征向量进行比对，得到匹配分数用于人脸验证或识别。S9: Compare the first feature vector obtained according to the three-dimensional face model with the second feature vector obtained according to N representative two-dimensional face images, and obtain a matching score for face verification or identification.

可以通过计算s8中处理后两个256维向量间的余弦相似度或者欧式距离来进行对比，得到匹配分数。The matching score can be obtained by calculating the cosine similarity or Euclidean distance between the two 256-dimensional vectors processed in s8.

在本发明进一步的实施例中，步骤S1～S4与S5～S7可以分先后顺序进行操作，也可以同时进行提取，操作顺序不分先后。In a further embodiment of the present invention, steps S1-S4 and S5-S7 may be performed in sequence, or may be extracted simultaneously, and the sequence of operations is not specific.

上述实施例中，通过将三维信息转换为UV位置图和UV法向量进行识别，因此能够提取到三维形状信息，再与提取到的二维信息进行融合，得到融合特征信息，利用融合信息进行人脸识别改善只使用投影图像和单张视频监控人脸图像进行识别在复杂环境可能出现的无法识别或识别错误的问题，提高系统的准确率。而且本发明所提供的识别方法，对于人脸视频流跟踪识别的场景具有更强的鲁棒性，相较于只利用二维纹理信息的识别方法，具有更广的适用范围更广、适用场景更具有通用性，识别效果更好。In the above embodiment, by converting the three-dimensional information into the UV position map and the UV normal vector for identification, the three-dimensional shape information can be extracted, and then fused with the extracted two-dimensional information to obtain the fusion feature information, and the fusion information can be used for human Face Recognition Improvement Only use projected images and single video surveillance face images to recognize the problems of unrecognized or incorrect recognition that may occur in complex environments, and improve the accuracy of the system. Moreover, the identification method provided by the present invention has stronger robustness to the scene of face video stream tracking identification, and has a wider scope of application and applicable scene compared with the identification method using only two-dimensional texture information. It is more versatile and has better recognition effect.

实施例2Example 2

图3示出了根据本发明示例性实施例的一种基于三维人脸模型识别二维人脸图片的装置，即电子设备310(例如具备程序执行功能的计算机服务器)，其包括至少一个处理器311，电源314，以及与所述至少一个处理器311通信连接的存储器312和输入输出接口313；所述存储器312存储有可被所述至少一个处理器311执行的指令，所述指令被所述至少一个处理器311执行，以使所述至少一个处理器311能够执行前述任一实施例所公开的方法；所述输入输出接口313可以包括显示器、键盘、鼠标、以及USB接口，用于输入输出数据；电源314用于为电子设备310提供电能。FIG. 3 shows an apparatus for recognizing a two-dimensional face picture based on a three-dimensional face model according to an exemplary embodiment of the present invention, that is, an electronic device 310 (eg, a computer server with a program execution function), which includes at least one processor 311, a power supply 314, and a memory 312 and an input-output interface 313 communicatively connected to the at least one processor 311; the memory 312 stores instructions executable by the at least one processor 311, the instructions being executed by the at least one processor 311 At least one processor 311 executes, so that the at least one processor 311 can execute the method disclosed in any of the foregoing embodiments; the input and output interface 313 may include a display, a keyboard, a mouse, and a USB interface for input and output Data; power supply 314 is used to provide power to electronic device 310 .

本领域技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、只读存储器(Read Only Memory，ROM)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes the above The steps of the method embodiment; and the aforementioned storage medium includes: a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.

当本发明上述集成的单元以软件功能单元的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。When the above-mentioned integrated units of the present invention are implemented in the form of software functional units and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of software products in essence or the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

以上所述，仅为本发明具体实施方式的详细说明，而非对本发明的限制。相关技术领域的技术人员在不脱离本发明的原则和范围的情况下，做出的各种替换、变型以及改进均应包含在本发明的保护范围之内。The above description is only a detailed description of the specific embodiments of the present invention, rather than a limitation of the present invention. Various substitutions, modifications and improvements made by those skilled in the relevant technical field without departing from the principle and scope of the present invention should be included within the protection scope of the present invention.

Claims

1. a video surveillance dynamic face recognition method based on 3D face model, is characterized in that, comprises:

A. Extract N two-dimensional face images to be identified; use the first feature extractor to perform feature extraction on N described two-dimensional face images to be identified to obtain a first two-dimensional feature vector; wherein, N is an integer and N ≥1;

And, converting the N two-dimensional face images to be identified into a three-dimensional face model, carrying out UV expansion on the obtained three-dimensional face model, obtaining a first UV map, and utilizing a second feature extractor to characterize the first UV map. extracting to obtain a first three-dimensional feature vector; connecting the first two-dimensional feature vector and the first three-dimensional feature vector to obtain a first fusion feature vector;

B. Project the pre-stored three-dimensional face model to the corresponding perspective of the N two-dimensional face images to be recognized, to obtain N two-dimensional projected face images, and use the first feature extractor to perform the N two-dimensional projected face images. Perform feature extraction on the face image to obtain a second two-dimensional feature vector;

and, performing UV expansion on the pre-stored three-dimensional face model to obtain a second UV image, and using a second feature extractor to perform feature extraction on the second UV image to obtain a second three-dimensional feature vector; the second two-dimensional feature vector is connected in series and the second three-dimensional feature vector to obtain the second fusion feature vector;

C. Perform feature comparison between the first fusion feature vector and the second fusion feature vector to obtain a face recognition result.

2. The method according to claim 1, wherein the extracting N two-dimensional face images to be identified comprises: tracking and identifying faces in video surveillance, obtaining face video streams, and screening based on preset Condition N pieces of the to-be-recognized two-dimensional face images are selected from the face video stream.

3. The method according to claim 1, wherein, when N>1, the N feature vectors of the extracted N two-dimensional face images to be identified are connected in series to obtain the first two-dimensional feature vector; when N>1, the second two-dimensional feature vector is obtained by concatenating the N feature vectors of the extracted N two-dimensional projected face images.

4. The method according to claim 1, wherein the step C further comprises: using a multilayer perceptron to perform feature transformation and dimension reduction on the first fusion feature vector and the second fusion feature vector respectively processing, to perform feature comparison between the second fusion feature vector after dimensionality reduction processing and the first fusion feature vector, to obtain a face recognition result.

5 . The method according to claim 4 , wherein the feature comparison is performed by calculating the cosine similarity or Euclidean distance of the second fusion feature vector after dimensionality reduction processing and the first fusion feature vector. 6 .

6. The method according to any one of claims 1-5, wherein the UV unwrapping comprises: converting the shape information of the three-dimensional face model into a UV position map and a UV normal vector map.

7. The method according to claim 6, wherein the value of the pixel on the UV position map is in one-to-one correspondence with the coordinates of each 3D point on the 3D face model, and the value of the pixel on the UV normal vector map is in one-to-one correspondence. One-to-one correspondence with the normal vector of each three-dimensional point on the three-dimensional face model.

8. The method of claim 1, wherein the first feature extractor and the second feature extractor are one of convolutional neural network VGG-16, ResNet-50 or ResNet-101 .

9. A video surveillance dynamic face recognition device based on a 3D face model, characterized in that it comprises at least one processor, and a memory connected in communication with the at least one processor; Instructions executed by at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.