CN111091492B

CN111091492B - A light transfer method for face images based on convolutional neural network

Info

Publication number: CN111091492B
Application number: CN201911335861.4A
Authority: CN
Inventors: 金鑫; 李忠兰; 肖超恩
Original assignee: Shaoding Artificial Intelligence Technology Co ltd
Current assignee: Beijing Hengmu Technical Service Co ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-09-04
Anticipated expiration: 2039-12-23
Also published as: CN111091492A

Abstract

A face image light transfer method based on convolutional neural network, which realizes the light transfer of face image by using convolutional neural network CNN. The method is mainly implemented by two parts: lighting model training and lighting classification, lighting matching and lighting transfer based on image style transfer. First, by combining the convolutional neural networks VGG19 and VGG16 to complete the illumination classification on the Yale Face face data set and the PIE face data set, a model that can classify the illumination of face images is obtained; The illumination matching of face images can obtain images similar to the given face image illumination from the face image illumination data set; finally, through the illumination classification model, the given reference face illumination images are extracted and processed. The migration to the input face image realizes the overall migration of the illumination of a single face image.

Description

A light transfer method for face images based on convolutional neural network

技术领域technical field

本发明是一种基于卷积神经网络的人脸图像光照迁移方法，属于计算机视觉领域。The invention relates to a method for light migration of a face image based on a convolutional neural network, and belongs to the field of computer vision.

背景技术Background technique

图像的光照效果在计算机视觉领域内的多个研究方向都是研究热点，对人像图像自身也很重要。光影的效果在现代数字��像��影美化、广告等艺术设计方面都有广泛的应用需求。影视制作中的难题之一即如何捕捉到演员最佳镜头表现的同时，场景中的光照效果为最理想的效果。从改进拍摄工艺方面解决这一难题，面临昂贵的成本费用，从后期制作方面解决这一难题，面临复杂的技术处理手段。在人像摄影美化方面，面临与影视制作类似的难题，专业摄像设置理想的摄像光照场景成本过高，也受摄像师个人能力等因素的影响，而后期使用专业的图像处理软件也需要时间和人力成本，十分复杂。The lighting effect of images is a research hotspot in many research directions in the field of computer vision, and it is also very important for portrait images themselves. The effect of light and shadow has a wide range of application requirements in modern digital film and television production, portrait photography beautification, advertising and other artistic design. One of the challenges in film and television production is how to capture the best shot performance of the actors, while the lighting in the scene is optimal. To solve this problem from the aspect of improving the shooting process is faced with expensive costs, and to solve this problem from the aspect of post-production, it is faced with complex technical processing methods. In the aspect of portrait photography and beautification, it is faced with similar problems as film and television production. The ideal camera lighting scene for professional camera settings is too expensive, and it is also affected by factors such as the cameraman’s personal ability, and the use of professional image processing software in the later stage also requires time and manpower. cost, very complicated.

人脸图像光照迁移将为上述难题提供一个简单的解决方案。在图像光照问题上，它只需给定理想光照效果的图像，即可将该光照效果迁移到目标图像上，使目标图像获得理想的光照效果。在视频图像问题上，它需要给定理想光照效果的图像或者视频，将该光照迁移到目标图像或视频中，两者中有一方或两方为视频文件。光照迁移旨在只需提供理想的光照效果，即可一步生成有理想光照效果的目标图像或视频，无需复杂的操作手段和方式，节省时间、人力、资源成本。目前人脸图像光照迁移方法均只改变了人脸图像面部的光照，没有处理人脸图像脖子和背景部分的光照。而实际应用场景中，人脸部分和非人脸部分密不可分，只改变人脸部分的光照不完全符合实际应用需求。本发明针对该问题，研究能够对整幅人脸图像做光照迁移的方法。它能使目标图像获得理想光照效果的同时，更接近实际的光照迁移应用场景。Illumination transfer for face images will provide a simple solution to the above problems. In the problem of image lighting, it only needs to give an image with ideal lighting effect, and then the lighting effect can be transferred to the target image, so that the target image can obtain the ideal lighting effect. In the video image problem, it needs to give an image or video with ideal lighting effect, and transfer the lighting to the target image or video, one or both of which are video files. Light migration aims to generate target images or videos with ideal lighting effects in one step only by providing ideal lighting effects, without complex operation methods and methods, saving time, manpower, and resource costs. At present, the illumination transfer methods of face images only change the illumination of the face of the face image, and do not deal with the illumination of the neck and background of the face image. In practical application scenarios, the face part and the non-face part are inseparable, and only changing the illumination of the face part does not fully meet the actual application requirements. Aiming at this problem, the present invention studies a method capable of performing illumination migration on the entire face image. It can make the target image get the ideal lighting effect, and at the same time, it is closer to the actual lighting transfer application scenario.

发明内容SUMMARY OF THE INVENTION

本发明的技术解决问题：在传统的人脸妆容和风格迁移的基础上提出基于卷积神经网络的人脸图像光照迁移。在结合了深度神经网络之后使光照迁移的效果更好、更接近真实图像光照效果，流程简单功能更强大。The technology of the present invention solves the problem: on the basis of traditional face makeup and style transfer, a face image illumination transfer based on a convolutional neural network is proposed. After combining the deep neural network, the effect of lighting migration is better and closer to the real image lighting effect, the process is simple and the function is more powerful.

本发明的技术解决方案：一种基于卷积神经网络的人脸图像光照迁移方法，包括如下步骤：The technical solution of the present invention: a method for migrating face image illumination based on convolutional neural network, comprising the following steps:

步骤1、数据集的准备及建立；Step 1. Data set preparation and establishment;

步骤2、预训练模型的准备；Step 2. Preparation of the pre-training model;

步骤3、得到光照分类模型；Step 3. Obtain the illumination classification model;

步骤4、基于光照分类模型的光照匹配；Step 4. Illumination matching based on illumination classification model;

步骤5、基于光照分类模型的光照迁移，具体如下：Step 5. Illumination migration based on the illumination classification model, as follows:

其中，所述步骤1、数据集的准备及建立：目前光照分类比较详细的数据集有YaleFace数据集和PIE数据集。Yale Face数据集B包含10个摄影对象，9名男性1名女性，均为黑白图像。每位摄像对象包含9种姿势，每种姿势下有64种光照效果，总5760张包含背景的人脸光照图像。两个数据集共21888张人脸图像。PIE人脸数据集是彩色人脸数据集，共68个摄影对象，包含男性、女性和所有人种。其中，包含13种姿势的没有背景灯光的人脸光照图像，3种姿势的包含背景灯光的人脸光照图像，每种姿势下包含21种光照，以摄像机的x方向、y方向、z方向的位置来分类。该数据集的人脸光照图像部分共22848张图片，采用包含背景灯光且姿势为面对摄像机的图片，共1428张图片。Among them, the step 1, the preparation and establishment of the data set: At present, the data sets with more detailed lighting classification include the YaleFace data set and the PIE data set. Yale Face dataset B contains 10 photographic subjects, 9 males and 1 female, all black and white images. Each camera object contains 9 poses, each pose has 64 lighting effects, a total of 5760 face lighting images including the background. The two datasets have a total of 21,888 face images. The PIE face dataset is a color face dataset with a total of 68 photographic subjects, including male, female and all races. Among them, there are 13 poses of face illumination images without background lights, 3 poses of face illumination images with background lights, each pose contains 21 kinds of lights, and the camera's x-direction, y-direction, z-direction location to classify. The face illumination image part of this dataset has a total of 22,848 pictures, including pictures with background lights and poses facing the camera, a total of 1,428 pictures.

以上两种数据集都较小，对Yale Face人脸数据集，采用28个摄影对象的人脸数据用于训练，10个摄影对象的人脸数据用于测试。对PIE人脸数据集，使用50个摄影对象的人脸数据用于训练，18个摄影对象的人脸数据用于测试。PIE数据集中包含背景灯光的所有图像的背景光照相近，对学习分类人脸光照造成干扰，因此对图像数据做了抠图处理。The above two datasets are relatively small. For the Yale Face face dataset, the face data of 28 photographic subjects are used for training, and the face data of 10 photographic subjects are used for testing. For the PIE face dataset, the face data of 50 photographic subjects are used for training, and the face data of 18 photographic subjects are used for testing. The background lighting of all images containing background lighting in the PIE dataset is similar, which interferes with the learning and classification of face lighting, so the image data is matted.

其中，所述步骤2、预训练模型的准备：目前有两种训练分类任务的方法，一种是从初始状态开始训练，另一种是迁移学习。在小数据集的基础上，使用预训练模型迁移学习获得的模型效果要比随机初始化的效果更好。因此本发明采用的是迁移学习的方式训练光照分类模型。针对于Yale Face人脸数据集，采用VGG19卷积神经网络，Matconvnet提供的物体分类数据集ImageNet的预训练模型。对PIE人脸数据集，使用卷积神经网络VGG16，Matconvnet提供的人脸识别模型Vgg Face的预训练模型。Matconvnet是Matlab中可使用卷积神经网络的工具箱，有丰富的预训练模型资源。Among them, the step 2, the preparation of the pre-training model: there are currently two methods for training classification tasks, one is to start training from an initial state, and the other is to transfer learning. On the basis of small datasets, the model effect obtained using pre-trained model transfer learning is better than the effect of random initialization. Therefore, the present invention adopts the method of migration learning to train the illumination classification model. For the Yale Face face data set, the VGG19 convolutional neural network is used, and the pre-training model of the object classification data set ImageNet provided by Matconvnet. For the PIE face dataset, use the convolutional neural network VGG16, the pre-trained model of the face recognition model Vgg Face provided by Matconvnet. Matconvnet is a toolbox that can use convolutional neural networks in Matlab, and there are rich pre-training model resources.

其中，所述步骤3、得到光照分类模型：训练光照分类模型首先按照分类任务去掉原网络的最后一层全连接层，添加新的用于训练的softmax层、top1-error层和top5-error层，根据数据集的类别数添加新的全连接层，Yale Face数据集为64类，PIE数据集为21类。然后fine tune整个网络，微调网络前几层的权重，学习率设置为1*10^-4，迭代300轮，完成人脸图像光照分类任务。由VGG16网络和VGG Face预训练模型训练出的光照分类模型，在PIE人脸数据集测试集上的平均准确率为62.8571％。由VGG19网络和ImageNet预训练模型训练出的光照分类模型，在Yale Face人脸数据集测试集上的平均准确率为94.375％，共64类光照中，有48类的光照效果分类准确率达到100％。Wherein, in the step 3, the illumination classification model is obtained: the training illumination classification model first removes the last fully connected layer of the original network according to the classification task, and adds a new softmax layer, top1-error layer and top5-error layer for training , add a new fully connected layer according to the number of categories in the dataset, 64 categories for the Yale Face dataset and 21 categories for the PIE dataset. Then fine-tune the entire network, fine-tune the weights of the first few layers of the network, set the learning rate to 1* ^10-4 , and iterate 300 rounds to complete the face image illumination classification task. The illumination classification model trained by the VGG16 network and the VGG Face pre-trained model has an average accuracy of 62.8571% on the PIE face dataset test set. The lighting classification model trained by the VGG19 network and the ImageNet pre-training model has an average accuracy of 94.375% on the Yale Face face dataset test set. Among 64 types of lighting, 48 types of lighting effects have a classification accuracy of 100%. %.

其中，所述步骤4、基于光照分类模型的光照匹配：光照匹配即给定人脸光照输入图像，在人脸光照图像数据集中寻找与输入图像光照效果相同的图像，输出该光照相同的图像。光照匹配的关��在于如何计算表示图像的光照信息。本发明采用的VGG神经网络由卷积层、全连接层、池化层、softmax层构成。卷积层运算的输出为数据较大的图像特征信息，从中提取光照信息较为困难，尚不能明确卷积神经网络对图像做处理后所得信息的内涵。VGG网络最后一层全连接层，输出数据维度与图像的类别数一致，较为容易与图像光照信息找出关联。因此，在光照分类模型的基础上，使用光照分类准确率较高且类别较详细的YaleFace数据集训练出的模型，寻找光照信息与模型最后一层全连接层的输出之间的关联，完成对人脸光照图像的光照匹配。通过观察大量图像经过网络全连接层的输出和图像类别的标签，发现两者之间存在某种规律，输出的64维数据中最大值对应的维度序号与图像类别标签的序号相同。因此，本发明创造性的提出基于光照分类模型的光照匹配算法如图3所示。In the step 4, lighting matching based on the lighting classification model: lighting matching is to give a face lighting input image, find an image with the same lighting effect as the input image in the face lighting image data set, and output the image with the same lighting. The key to lighting matching is how to calculate the lighting information representing the image. The VGG neural network adopted in the present invention is composed of a convolution layer, a fully connected layer, a pooling layer and a softmax layer. The output of the convolutional layer operation is image feature information with large data, and it is difficult to extract illumination information from it. The last fully connected layer of the VGG network, the output data dimension is consistent with the number of categories of the image, and it is easier to find the correlation with the image lighting information. Therefore, on the basis of the illumination classification model, the model trained by the YaleFace dataset with high illumination classification accuracy and more detailed categories is used to find the correlation between the illumination information and the output of the last fully connected layer of the model, and complete the matching Illumination matching for face illumination images. By observing the output of a large number of images through the fully connected layer of the network and the label of the image category, it is found that there is a certain rule between the two. The dimension sequence number corresponding to the maximum value in the output 64-dimensional data is the same as the sequence number of the image category label. Therefore, the present invention creatively proposes an illumination matching algorithm based on an illumination classification model, as shown in FIG. 3 .

其中，所述步骤5、基于光照分类模型的光照迁移：在光照分类模型的基础上，受Neural Style启发，结合传统光照迁移研究的商图像方法完成端到端的单幅人脸图像光照迁移。传统的Neural Style采用的是输入内容图和风格图，分别计算出内容图和风格图的损失，最后将内容图和风格图的损失线性组合得到最终的风格迁移图。基于此思想，本发明提出了基于光照分类模型的光照迁移，同样也是准备光照迁移所需的输入图像和参考图像，输入图像一般为均匀正面光照图像，参考图像一般为有��显光影差异的图像，将输入图像和参考图像输入到光照迁移网络(VGG19)中，得出两幅图像的特征矩阵值，然后根据特征矩阵值求解出光影商。将光��商和期望结果图像返回到迁移网络中最小化迁移损失函数，迭代1000轮后输出结果图像。在具体实验中发现传统的Neural Style中风格迁移部分并不能迁移学习光照信息，故最终只使用了其内容迁移部分来迁移图像的光照信息。Among them, the step 5, illumination migration based on illumination classification model: on the basis of illumination classification model, inspired by Neural Style, combined with the quotient image method of traditional illumination migration research, the end-to-end single face image illumination migration is completed. The traditional Neural Style uses the input content map and style map, respectively calculates the loss of the content map and style map, and finally linearly combines the losses of the content map and style map to obtain the final style transfer map. Based on this idea, the present invention proposes illumination migration based on illumination classification model, which is also the input image and reference image required for preparing illumination migration. Input the input image and the reference image into the light transfer network (VGG19), get the feature matrix value of the two images, and then solve the light and shadow quotient according to the feature matrix value. Return the light and shadow quotient and the expected result image to the transfer network to minimize the transfer loss function, and output the result image after 1000 iterations. In specific experiments, it was found that the style transfer part of the traditional Neural Style could not transfer and learn the lighting information, so finally only the content transfer part was used to transfer the lighting information of the image.

进一步的，所述光影商求解过程如下：Further, the light and shadow quotient solution process is as follows:

基于光照分类模型的光照迁移方法计算光影商之前先计算参考图像和目标图像经过VGG网络学习后的比率，其公式如下：The light migration method based on the light classification model calculates the ratio of the reference image and the target image after learning by the VGG network before calculating the light and shadow quotient. The formula is as follows:

其中，F_l[I]是输入图像I的在l层卷积层处的特征矩阵，F_l[E]是参考图像E在l层卷积层处的特征矩阵；ε为一个常数，设定为0.0001，参考图像的特征矩阵值除输入图像的特征值，获得参考图像和目标图像经过VGG网络学习后的比率S_l，输入图像特征矩阵乘该比率来获得光影商，公式如下：Among them, F _l [I] is the feature matrix of the input image I at the l-layer convolutional layer, F _l [E] is the feature matrix of the reference image E at the l-layered convolutional layer; ε is a constant, set is 0.0001, the feature matrix value of the reference image is divided by the feature value of the input image to obtain the ratio S _l of the reference image and the target image after learning by the VGG network, and the input image feature matrix is multiplied by this ratio to obtain the light and shadow quotient. The formula is as follows:

F_l[M]＝F_l[I]×S_l F _l [M]=F _l [I]×S _l

其中，本发明对F_l[M](F_l[M]是图像M的在l层卷积层处的光影商)做了如公式

所示的约束改进。其中，约束值r_ij为0.4、5经实验得出，在该约束范围内即能较好的迁移光照信息，也不会过多的把参考图像的结构和内容迁移到输入图像上。Among them, the present invention makes the formula for F _l [M] (F _l [M] is the light and shadow quotient of the image M at the l-layer convolutional layer)

Constraint improvements shown. Among them, the constraint value r _ij is 0.4 and 5. It is obtained through experiments that within this constraint range, the illumination information can be transferred well, and the structure and content of the reference image will not be transferred to the input image too much.

进一步的，所述迁移损失函数计算过程如下：Further, the calculation process of the migration loss function is as follows:

Neural Style创造性地提出了图片的content loss(内容损失)和style loss(风格损失)，用来表示卷积神经网络学习图像的内容和风格的迁移损失，其公式为：Neural Style creatively proposes the content loss (content loss) and style loss (style loss) of the image, which are used to represent the transfer loss of the convolutional neural network to learn the content and style of the image. The formula is:

其中，l表示在第l层卷积层处，存在尺寸为D_l的向量化特征映射的N_l个滤波器(D_l是滤波器响应中元素的数量)。F_l[.]∈R^Nl×Dl是得到的特征矩阵，(i,j)为特征矩阵的索引。(F_l[I],F_l[E],F_l[O])分别是输入图像I的特征矩阵、参考绘画风格图像E的特征矩阵和第l层卷积层的期望输出的结果图像O的特征矩阵。L是被检查网络中的总层数，其中(α_l，β_l)是配置好的权重参数。Γ是在输入图像内容的完整性和绘画风格迁移量之间调和的权重。where _l denotes that at the _lth convolutional layer, there are Nl filters of a vectorized feature map of size Dl ( _Dl is the number of elements in the filter response). F _l [.]∈R ^Nl×Dl is the obtained feature matrix, and (i,j) is the index of the feature matrix. (Fl[ _I ], _Fl [E], _Fl [O]) are the feature matrix of the input image I, the feature matrix of the reference painting style image E and the result image O of the expected output of the lth convolutional layer, respectively feature matrix. L is the total number of layers in the network being examined, where (α _l , β _l ) are the configured weight parameters. Γ is a weight reconciling between the completeness of the input image content and the amount of painting style transfer.

本发明迁移方法的迁移损失函数为：The migration loss function of the migration method of the present invention is:

F_l[O]为期望得到的结果图像，F_l[M]为光影商，计算两者的光影商的距离并不断缩小两者间的差距，获得自然的光照迁移。经实验后保留Neural Style中内容迁移部分来迁移图像的光照信息。经实验确定用于光照迁移的卷积层为卷积1_2层和卷积2_1层，较低的卷积层对于图像的内容、结构信息比较敏感，能够保留输入图像的内容信息。基于光照分类模型的光照迁移方法在同一个摄像对象上，能够迁移多种方向的光照，包括正光源、左光源、右光源，能够迁移光照，且光照自然。F _l [O] is the expected result image, F _l [M] is the light and shadow quotient, the distance between the two light and shadow quotients is calculated and the gap between the two is continuously narrowed to obtain natural light migration. After the experiment, the content transfer part in Neural Style is retained to transfer the illumination information of the image. The convolutional layers used for illumination migration are determined by experiments as the convolutional layers 1_2 and 2_1. The lower convolutional layers are sensitive to the content and structure information of the image and can retain the content information of the input image. The illumination migration method based on the illumination classification model can migrate illumination in multiple directions on the same camera object, including positive light source, left light source, and right light source, and can migrate illumination, and the illumination is natural.

本发明与现有技术相比的优点在于：The advantages of the present invention compared with the prior art are:

(1)深度学习的兴起为计算机视觉领域的很多研究带来了突破。风格迁移就是其中的典型，本发明在风格迁移的启发下，提供一种使用卷积神经网络进行光照迁移的方法，相比于传统的方法完成人脸妆容迁移和风格迁移的效果更好并且使得人脸图像的光照迁移效果更接近真实图像光照效果的结果。(1) The rise of deep learning has brought breakthroughs to many studies in the field of computer vision. Style transfer is a typical example. Inspired by style transfer, the present invention provides a method for light transfer using convolutional neural network, which is more effective than traditional methods to complete face makeup transfer and style transfer, and makes The light transfer effect of the face image is closer to the result of the light effect of the real image.

(2)本发明在光照分类比较详细的数据集Yale Face数据集和PIE数据集上面采用预训练模型进行迁移学习的光照分类模型训练。针对不同的数据集使用不同的预训练模型并对预训练模型的神经结构进行了改进，微调之后再训练以达到较好的光照分类模型。(2) In the present invention, the pre-training model is used to train the illumination classification model of migration learning on the Yale Face data set and the PIE data set, which are more detailed in illumination classification. Different pre-trained models are used for different datasets and the neural structure of the pre-trained models is improved, fine-tuned and then trained to achieve a better illumination classification model.

(3)本发明在基于光照分类模型的基础上进行了光照匹配，对给定人脸光照输入图像，在人脸光照图像数据集中寻找与输入图像光照效果相同的图像，输出该光照相同的图像。为得到以上目标经过大量实验发现神经网络中的光照信息与模型最后一层全连接层的输出之间的关联，完成对人脸光照图像的光照匹配并且提出光照匹配算法。(3) The present invention performs illumination matching based on the illumination classification model. For a given face illumination input image, an image with the same illumination effect as the input image is found in the face illumination image data set, and the image with the same illumination is output. . In order to obtain the above goals, a large number of experiments were conducted to find the correlation between the illumination information in the neural network and the output of the last fully connected layer of the model, to complete the illumination matching of the face illumination image and propose an illumination matching algorithm.

(4)本发明在得到训练较好的光照分类模型基础上进行光照分类模型的光照匹配并且结合图像风格迁移创新性的提出了光照迁移，并且提出改进了传统风格迁移的损失值函数，只取其内容迁移部分来迁移图像的光照信息，最终得到基于卷积神经网络的人脸图像光照迁移方法。(4) The present invention performs illumination matching of the illumination classification model on the basis of obtaining a well-trained illumination classification model, and innovatively proposes illumination migration in combination with image style migration, and proposes an improved loss value function for traditional style migration. The content migration part is used to migrate the illumination information of the image, and finally a face image illumination migration method based on convolutional neural network is obtained.

附图说明Description of drawings

图1为本发明框架图；Fig. 1 is the framework diagram of the present invention;

图2为本发明使用的卷积神经网络VGG16和VGG19的网络结构图；Fig. 2 is the network structure diagram of the convolutional neural network VGG16 and VGG19 used in the present invention;

图3为本发明的基于光照分类模型的光照匹配算法流程图；Fig. 3 is the flow chart of the illumination matching algorithm based on illumination classification model of the present invention;

图4为本发明的基于光照分类模型的光照迁移算法流程图。FIG. 4 is a flowchart of the illumination migration algorithm based on the illumination classification model of the present invention.

具体实施方式Detailed ways

为了更好地理解本发明，先对一些基本概念进行一下解释说明。For a better understanding of the present invention, some basic concepts are first explained.

卷积神经网络：Convolutional Nerual Networks，CNN是一类包含卷积计算且常用于深度学习的神经网络。Convolutional Neural Networks: Convolutional Nerual Networks, CNN is a class of neural networks that contain convolutional computations and are often used in deep learning.

卷积层：通过卷积计算提取图像特征；Convolution layer: extract image features through convolution calculation;

池化层：对输入的特征图像进行压缩，简化网络计算复杂度；Pooling layer: compress the input feature image to simplify the computational complexity of the network;

全连接层：连接所有的特征，将输出值送给softmax层；Fully connected layer: connect all features and send the output value to the softmax layer;

迁移学习：将经过训练能够完成某种分类任务的网络模型中的权重，迁移到另一个目标分类训练的全新网络模型中，而不是从初始状态开始训练。Transfer learning: Instead of starting training from the initial state, transfer the weights in a network model trained to perform a certain classification task to a new network model trained for another target classification.

光照匹配：光照匹配即给定人脸光照输入图像，在人脸光照图像数据集中寻找与输入图像光照效果相同的图像，输出该光照相同的图像。Illumination matching: Illumination matching means that given a face illumination input image, find an image with the same illumination effect as the input image in the face illumination image dataset, and output the image with the same illumination.

Matconvnet：Matlab中可使用卷积神经网络的工具箱，有丰富的预训练模型资源。Matconvnet: A toolbox that can use convolutional neural networks in Matlab, with rich pre-trained model resources.

参见图1，本发明整个实现过程如下：Referring to Fig. 1, the whole realization process of the present invention is as follows:

(1)首先是数据集的准备及建立，选择Yale Face数据集与PIE数据集。对YaleFace人脸数据集，采用28个摄影对象的人脸数据用于训练，10个摄影对象的人脸数据用于测试。对PIE人脸数据集，使用50个摄影对象的人脸数据用于训练，18个摄影对象的人脸数据用于测试。(1) The first is the preparation and establishment of the data set, select the Yale Face data set and the PIE data set. For the YaleFace face dataset, the face data of 28 photographic subjects are used for training, and the face data of 10 photographic subjects are used for testing. For the PIE face dataset, the face data of 50 photographic subjects are used for training, and the face data of 18 photographic subjects are used for testing.

(2)针对选择的两个数据集使用不同的预训练模型。其中Yale Face数据集使用卷积神经网络VGG19，Matconvnet提供的为ImageNet预训练模型，对于PIE数据集使用卷积神经网络VGG16，Matconvnet提供的为Vgg Face预训练模型，如图2所示为VGG16和VGG19的神经网络结构图。(2) Use different pretrained models for the two selected datasets. The Yale Face dataset uses the convolutional neural network VGG19, the ImageNet pre-training model provided by Matconvnet, the convolutional neural network VGG16 for the PIE dataset, and the Vgg Face pre-training model provided by Matconvnet, as shown in Figure 2 for VGG16 and Neural network structure diagram of VGG19.

(3)得到光照分类模型。修改VGG16和VGG19的网络结构，按照分类任务去掉原网络的最后一层全连接层，添加新的用于训练的softmax层、top1-error层和top5-error层，根据数据集的类别数添加新的全连接层，Yale Face数据集为64类，PIE数据集为21类。然后fine tune整个网络，微调网络前几层的权重，学习率设置为，迭代300轮，完成人脸图像光照分类任务。由VGG16网络和VGG Face预训练模型训练出的光照分类模型，在PIE人脸数据集测试集上的平均准确率为62.8571％。由VGG19网络和ImageNet预训练模型训练出的光照分类模型，在Yale Face人脸数据集测试集上的平均准确率为94.375％，共64类光照中，有48类的光照效果分类准确率达到100％。(3) Obtain the illumination classification model. Modify the network structure of VGG16 and VGG19, remove the last fully connected layer of the original network according to the classification task, add a new softmax layer, top1-error layer and top5-error layer for training, and add new ones according to the number of categories in the dataset The fully connected layer of the Yale Face dataset has 64 classes and the PIE dataset has 21 classes. Then fine tune the entire network, fine-tune the weights of the first few layers of the network, set the learning rate to 300 iterations, and complete the face image illumination classification task. The illumination classification model trained by the VGG16 network and the VGG Face pre-trained model has an average accuracy of 62.8571% on the PIE face dataset test set. The lighting classification model trained by the VGG19 network and the ImageNet pre-training model has an average accuracy of 94.375% on the Yale Face face dataset test set. Among 64 types of lighting, 48 types of lighting effects have a classification accuracy of 100%. %.

(4)在光照分类模型基础上进行光照匹配。进行光照匹配的关键在于如何计算表示图像的光照信息。在大量的实验后发现图像经过网络全连接层的输出和图像类别的标签存在某种规律，输出的64维数据中最大值对应的维度序号与图像类别标签的序号相同。因此，提出基于光照分类模型的光照匹配算法，具体算法流程如图3所示。(4) Illumination matching is performed on the basis of illumination classification model. The key to lighting matching is how to calculate the lighting information representing the image. After a large number of experiments, it is found that there is a certain pattern between the output of the image through the fully connected layer of the network and the label of the image category. The dimension sequence number corresponding to the maximum value in the output 64-dimensional data is the same as the sequence number of the image category label. Therefore, a lighting matching algorithm based on the lighting classification model is proposed, and the specific algorithm flow is shown in Figure 3.

(5)受Neural Style启发，结合传统光照迁移研究的商图像方法，修改其损失函数，仅保留其内容迁移部分来迁移图像的光照信息，完成端到端的单幅人脸图像光照迁移，具体算法流程如图4所示。(5) Inspired by Neural Style, combined with the quotient image method of traditional illumination migration research, modify its loss function, retain only its content migration part to migrate the illumination information of the image, and complete the end-to-end single face image illumination migration. The specific algorithm The process is shown in Figure 4.

上述各步骤的具体实现过程如下：The specific implementation process of the above steps is as follows:

所述步骤(1)中所使用的卷积神经网络VGG16和卷积神经网络VGG19的网络结构图本发明中使用的核心神经网络结构图如图2所示：The network structure diagram of the convolutional neural network VGG16 and the convolutional neural network VGG19 used in the step (1) The core neural network structure diagram used in the present invention is shown in Figure 2:

(1.1)VGG16含有16个权重层，VGG19含有19个权重层，其中权重层包含卷积层和全连接层；(1.1) VGG16 contains 16 weight layers, VGG19 contains 19 weight layers, of which the weight layers include convolutional layers and fully connected layers;

(1.2)VGG16与VGG19输入224*224RGB图像；(1.2) VGG16 and VGG19 input 224*224RGB images;

(1.3)VGG16含有13个卷积层，分别用conv3-xxx表示，其中3表示卷积核大小为3*3，-xxx表示卷积次数；3个全连接层，分别用FC-xxx；5个池化层，用maxpool表示；1个sofmax层；(1.3) VGG16 contains 13 convolutional layers, which are represented by conv3-xxx respectively, where 3 means that the convolution kernel size is 3*3, and -xxx means the number of convolutions; 3 fully connected layers are represented by FC-xxx respectively; 5 A pooling layer, represented by maxpool; 1 sofmax layer;

(1.4)VGG19含有16个卷积层，分别用conv3-xxx表示，其中3表示卷积核大小为3*3，-xxx表示卷积次数；3个全连接层，分别用FC-xxx；5个池化层，用maxpool表示；1个sofmax层。(1.4) VGG19 contains 16 convolutional layers, which are represented by conv3-xxx respectively, where 3 means that the convolution kernel size is 3*3, and -xxx means the number of convolutions; 3 fully connected layers are represented by FC-xxx respectively; 5 A pooling layer, represented by maxpool; 1 sofmax layer.

所述步骤(4)中基于光照分类模型的光照匹配具体算法流程The specific algorithm flow of illumination matching based on illumination classification model in the step (4)

在训练得到光照分类模型的基础上实现关照匹配，该光照匹配算法流程如图3所示：On the basis of training the illumination classification model, the care matching is realized. The flow of the illumination matching algorithm is shown in Figure 3:

(4.1)开始；(4.1) Begin;

(4.2)将人脸光照图像输入到光照分类模型：训练光照分类模型首先按照分类任务去掉原网络的最后一层全连接层，添加新的用于训练的softmax层、top1-error层和top5-error层，根据数据集的类别数添加新的全连接层，Yale Face数据集为64类，PIE数据集为21类。然后fine tune整个网络，微调网络前几层的权重，学习率设置为，迭代300轮，完成人脸图像光照分类任务；(4.2) Input the face illumination image to the illumination classification model: To train the illumination classification model, first remove the last fully connected layer of the original network according to the classification task, and add a new softmax layer, top1-error layer and top5- for training. The error layer adds a new fully connected layer according to the number of categories in the dataset. The Yale Face dataset has 64 categories and the PIE dataset has 21 categories. Then fine tune the entire network, fine-tune the weights of the first few layers of the network, set the learning rate to 300 iterations, and complete the face image illumination classification task;

(4.3)取出模型最后一层全连接层的输出值：本发明采用的VGG神经网络由卷积层、全连接层、池化层、softmax层构成。卷积层运算的输出为数据较大的图像特征信息，从中提取光照信息较为困难，尚不能明确卷积神经网络对图像做处理后所得信息的内涵。VGG网络最后一层全连接层，输出数据维度与图像的类别数一致，较为容易与图像光照信息找出关联。因此，在光照分类模型的基础上，使用光照分类准确率较高且类别较详细的YaleFace数据集训练出的模型，寻找光照信息与模型最后一层全连接层的输出之间的关联，完成对人脸光照图像的光照匹配；(4.3) Take out the output value of the last fully connected layer of the model: the VGG neural network used in the present invention is composed of a convolution layer, a fully connected layer, a pooling layer, and a softmax layer. The output of the convolutional layer operation is image feature information with large data, and it is difficult to extract illumination information from it. The last fully connected layer of the VGG network, the output data dimension is consistent with the number of categories of the image, and it is easier to find the correlation with the image lighting information. Therefore, on the basis of the illumination classification model, the model trained by the YaleFace dataset with high illumination classification accuracy and more detailed categories is used to find the correlation between the illumination information and the output of the last fully connected layer of the model, and complete the matching Illumination matching of face illumination images;

(4.4)得出最大值及所在维度的序号：通过观察大量图像经过网络全连接层的输出和图像类别的标签，发现两者之间存在某种规律，输出的64维数据中最大值对应的维度序号与图像类别标签的序号相同，因此得到最大值及所在维度的序号为后面找到与输入图像维度序号相同的图像；(4.4) Obtain the maximum value and the serial number of its dimension: By observing the output of a large number of images passing through the fully connected layer of the network and the label of the image category, it is found that there is a certain rule between the two. The maximum value in the output 64-dimensional data corresponds to The dimension serial number is the same as the serial number of the image category label, so the maximum value and the serial number of the dimension are obtained later to find the image with the same dimension serial number as the input image;

(4.5)在人脸数据集中寻找与输入图像维度序号相同的图像：基于上述的步骤后，在人脸数据集中找到与输入图像的维度序号相同的图像以便达到光照的匹配；(4.5) Find the image with the same dimension serial number as the input image in the face dataset: after the above-mentioned steps, find the image with the same dimension number as the input image in the face dataset so as to achieve the matching of illumination;

(4.6)对比所有维度序号相同图像的最大值：在人脸数据集中寻找到的与输入图像维度序号相同的图像中对比所有维度序号相同图像的最大值得到的最大值图像即为人脸数据集中图像与输入图像光照最匹配的图像；(4.6) Compare the maximum value of all images with the same dimension serial number: The maximum image obtained by comparing the maximum value of all images with the same dimension serial number in the image with the same dimension serial number as the input image found in the face dataset is the image in the face dataset. the image that best matches the lighting of the input image;

(4.7)输出最大值与输入图像��的��像为��配图像；(4.7) The image whose output maximum value is closest to the input image is the matching image;

(4.8)结束。(4.8) End.

所述步骤5中基于光照分类模型的光照迁移具体算法流程The specific algorithm flow of illumination migration based on illumination classification model in the step 5

在光照分类模型的基础上，受Neural Style启发，结合传统光照迁移研究的商图像方法完成端到端的单幅人脸图像光照迁移。传统的Neural Style采用的是输入内容图和风格图，分别计算出内容图和风格图的损失，最后将内容图和风格图的损失线性组合得到最终的风格迁移图。基于此思想，本发明提出了基于光照分类模型的光照迁移，具体算法的实现过程如图4所示：Based on the illumination classification model, inspired by Neural Style, combined with the quotient image method of traditional illumination migration research, the end-to-end illumination migration of a single face image is completed. The traditional Neural Style uses the input content map and style map, respectively calculates the loss of the content map and style map, and finally linearly combines the losses of the content map and style map to obtain the final style transfer map. Based on this idea, the present invention proposes illumination migration based on illumination classification model. The implementation process of the specific algorithm is shown in Figure 4:

(5.1)开始；(5.1) Begin;

(5.2)输入参考图像和输入图像到迁移网络中：输入图像一般为均匀正面光照图像，参考图像一般为有明显光影差异的图像，将输入图像和参考图像输入到光照迁移网络(VGG19)中；(5.2) Input the reference image and the input image into the migration network: the input image is generally an image with uniform frontal illumination, and the reference image is generally an image with obvious light and shadow difference, and the input image and the reference image are input into the light migration network (VGG19);

(5.3)得出输入图像和参考图像的特征值：输入图像和参考图像通过光照迁移网络的计算得到图像的特征值，便于计算图像的光影商；(5.3) Obtain the eigenvalues of the input image and the reference image: the input image and the reference image obtain the eigenvalues of the image through the calculation of the illumination migration network, which is convenient for calculating the light and shadow quotient of the image;

(5.4)求解光影商：计算光影商之前先计算参考图像和目标图像经过VGG网络学习后的比率，其公式如下：(5.4) Solving the light and shadow quotient: Before calculating the light and shadow quotient, calculate the ratio of the reference image and the target image after learning by the VGG network, and the formula is as follows:

F_l[M]＝F_l[I]×S_l F _l [M]=F _l [I]×S _l

其中本发明对F_l[M](F_l[M]是图像M的在l层卷积层处的光影商)做了如公式

所示的约束改进。其中，约束值r_ij为0.4、5经实验得出，在该约束范围内即能较好的迁移光照信息，也不会过多的把参考图像的结构和内容迁移到输入图像上；Wherein the present invention makes the formula for F _l [M] (F _l [M] is the light and shadow quotient of the image M at the l-layer convolutional layer)

Constraint improvements shown. Among them, the constraint value r _ij is 0.4 and 5. It is obtained through experiments that within this constraint range, the illumination information can be transferred well, and the structure and content of the reference image will not be transferred to the input image too much;

(5.5)通过迁移损失函数求得结果图像：Neural Style创造性地提出了图片的contentloss(内容损失)和style loss(风格损失)，用来表示卷积神经网络学习图像的内容和风格的迁移损失，其公式为：(5.5) Obtain the result image through the transfer loss function: Neural Style creatively proposes the contentloss (content loss) and style loss (style loss) of the image, which are used to represent the transfer loss of the convolutional neural network to learn the content and style of the image, Its formula is:

其中，l表示在第l层卷积层处，存在尺寸为D_l的向量化特征映射的N_l个滤波器(D_l是滤波器响应中元素的数量)。F_l[.]∈R^Nl×Dl是得到的特征矩阵，(i,j)为特征矩阵的索引。(F_l[I],F_l[E],F_l[O])分别是输入图像I的特征矩阵、参考绘画风格图像E的特征矩阵和第l层卷积层的期望输出的结果图像O的特征矩阵。L是被检查网络中的总层数，其中(αl，β_l)是配置好的权重参数。Γ是在输入图像内容的完整性和绘画风格迁移量之间调和的权重。where _l denotes that at the _lth convolutional layer, there are Nl filters of a vectorized feature map of size Dl ( _Dl is the number of elements in the filter response). F _l [.]∈R ^Nl×Dl is the obtained feature matrix, and (i,j) is the index of the feature matrix. (Fl[ _I ], _Fl [E], _Fl [O]) are the feature matrix of the input image I, the feature matrix of the reference painting style image E and the result image O of the expected output of the lth convolutional layer, respectively feature matrix. L is the total number of layers in the inspected network, where (αl, β _l ) are the configured weight parameters. Γ is a weight reconciling between the completeness of the input image content and the amount of painting style transfer.

光照迁移方法的迁移损失函数为：The transfer loss function of the light transfer method is:

F_l[O]为期望得到的结果图像，F_l[M]为光影商，计算两者的光影商的距离并不断缩小两者间的差距，获得自然的光照迁移。经实验后保留Neural Style中内容迁移部分来迁移图像的光照信息。经实验确定用于光照迁移的卷积层为卷积1_2层和卷积2_1层，较低的卷积层对于图像的内容、结构信息比较敏感，能够保留输入图像的内容信息。基于光照分类模型的光照迁移方法在同一个摄像对象上，能够迁移多种方向的光照，包括正光源、左光源、右光源，能够迁移光照，且光照自然；F _l [O] is the expected result image, F _l [M] is the light and shadow quotient, the distance between the two light and shadow quotients is calculated and the gap between the two is continuously narrowed to obtain natural light migration. After the experiment, the content transfer part in Neural Style is retained to transfer the illumination information of the image. The convolutional layers used for illumination migration are determined by experiments as the convolutional layers 1_2 and 2_1. The lower convolutional layers are sensitive to the content and structure information of the image and can retain the content information of the input image. The illumination migration method based on the illumination classification model can migrate illumination in multiple directions on the same camera object, including positive light source, left light source, and right light source, and can migrate illumination, and the illumination is natural;

(5.6)返回结果图像；(5.6) Return the result image;

(5.7)结束。(5.7) END.

尽管上面对本发明说明性的具体实施方式进行了描述，以便于本技术领域的技术人员理解本发明，且应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求��定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。Although illustrative specific embodiments of the present invention have been described above to facilitate understanding of the present invention by those skilled in the art, it should be clear that the present invention is not limited in scope to the specific embodiments, to those skilled in the art, As long as various changes are within the spirit and scope of the present invention as defined and determined by the appended claims, these changes are obvious, and all inventions and creations utilizing the inventive concept are included in the protection list.

Claims

1. a face image illumination migration method based on convolutional neural network, is characterized in that, comprises the steps:

Step 1. Data set preparation and establishment;

Step 2. Preparation of the pre-training model;

Step 3. Obtain the illumination classification model;

Step 4. Illumination matching based on illumination classification model;

Step 5. Illumination migration based on illumination classification model;

Step (4) The lighting matching based on the lighting classification model specifically includes the following steps:

(4.1) Begin;

(4.2) Input the face illumination image to the illumination classification model: To train the illumination classification model, first remove the last fully connected layer of the original network according to the classification task, and add a new softmax layer, top1-error layer and top5- for training. error layer, add a new fully connected layer according to the number of categories in the dataset, the Yale Face dataset is 64 categories, and the PIE dataset is 21 categories; then fine-tune the entire network, fine-tune the weights of the first few layers of the network, and set the learning rate to 1* ^10-4 , iterate 300 rounds, complete the task of face image illumination classification;

(4.3) Take out the output value of the last fully connected layer of the model: the VGG neural network used is composed of a convolutional layer, a fully connected layer, a pooling layer, and a softmax layer. The output image feature information of the convolutional layer operation is used in the illumination classification. On the basis of the model, the model trained by the Yale Face dataset is used to find the correlation between the illumination information and the output of the last fully connected layer of the model to complete the illumination matching of the face illumination image;

(4.4) Obtain the maximum value and the serial number of the dimension: the image passes through the output of the fully connected layer of the network and the label of the image category. The serial number of the dimension corresponding to the maximum value in the output 64-dimensional data is the same as the serial number of the image category label, and the maximum value is obtained. and the serial number of the dimension;

(4.5) Use step (4.4) to obtain the maximum value and the serial number of the dimension where it is located. Find the image with the same dimension serial number as the input image in the face data set: find the image with the same dimension serial number as the input image in the face data set in order to achieve the illumination effect. match;

(4.6) Compare the maximum value of all images with the same dimension serial number: In the face data set found in the image with the same dimension serial number as the input image, the maximum image obtained by comparing the maximum value of all the images with the same dimension serial number is the face data set. The image that best matches the lighting of the input image;

(4.7) output the maximum image as the matching image of the image closest to the input image;

(4.8) End.

2. a kind of face image illumination migration method based on convolutional neural network according to claim 1, is characterized in that, the preparation and establishment of step (1) data set, comprise the steps:

For the Yale Face face data set, the face data of 28 photographic subjects are used for training, and the face data of 10 photographic subjects are used for testing;

For the PIE face data set, the face data of 50 photographic subjects is used for training, and the face data of 18 photographic subjects is used for testing; the image data is matted.

3. a kind of face image illumination migration method based on convolutional neural network according to claim 1, is characterized in that, the preparation of step (2) pre-training model, comprises:

For the Yale Face face data set, the VGG19 convolutional neural network is used, and the pre-training model of the object classification data set ImageNet provided by Matconvnet;

For the PIE face dataset, use the convolutional neural network VGG16, the pre-trained model of the face recognition model VggFace provided by Matconvnet.

4. a kind of face image illumination migration method based on convolutional neural network according to claim 1, is characterized in that, step (3) obtains illumination classification model as follows:

To train the illumination classification model, first remove the last fully connected layer of the original network according to the classification task, add a new softmax layer, top1-error layer and top5-error layer for training, and add a new fully connected layer according to the number of categories in the dataset layer, the Yale Face dataset is 64 categories, and the PIE dataset is 21 categories; then fine tune the entire network, fine-tune the weights of the first few layers of the network, set the learning rate to 1*10 ^-4 , and iterate 300 rounds to complete the face image illumination For the classification task, the illumination classification model is trained by the VGG16 network and the VGGFace pre-trained model.

5. a kind of face image illumination migration method based on convolutional neural network according to claim 1, is characterized in that, the illumination matching of step (4) based on illumination classification model is as follows:

Find the image with the same lighting effect as the input image in the face illumination image dataset, and output the image with the same illumination. On the basis of the illumination classification model, use the model trained from the Yale Face dataset to complete the illumination of the face illumination image. match.

6. a kind of face image illumination migration method based on convolutional neural network according to claim 1, is characterized in that, step (5) is based on the illumination migration of illumination classification model:

First, prepare the input image and reference image required for light migration. The input image is a uniform frontal illumination image, and the reference image is an image with obvious light and shadow difference. Input the input image and reference image into the light migration network, and get the two images. The feature matrix value is obtained, and then the light and shadow quotient is obtained according to the feature matrix value; the light and shadow quotient and the expected result image are returned to the transfer network to minimize the transfer loss function, and the result image is output after multiple iterations.

7. a kind of face image illumination migration method based on convolutional neural network according to claim 6, is characterized in that, the illumination migration of step (5) based on illumination classification model specifically comprises:

(5.1) Begin;

(5.2) Input the reference image and the input image into the migration network: the input image is a uniform frontal illumination image, the reference image is an image with obvious light and shadow difference, and the input image and the reference image are input into the illumination migration network VGG19;

(5.3) Obtain the eigenvalues of the input image and the reference image: the input image and the reference image obtain the eigenvalues of the image through the calculation of the illumination migration network, which is convenient for calculating the light and shadow quotient of the image;

(5.4) Solving the light and shadow quotient: Before calculating the light and shadow quotient, calculate the ratio of the reference image and the target image after learning by the VGG network, and the formula is as follows:

Among them, F _l [I] is the feature matrix of the input image I at the l-layer convolutional layer, F _l [E] is the feature matrix of the reference image E at the l-layered convolutional layer; ε is a constant, set is 0.0001, the feature matrix value of the reference image is divided by the feature value of the input image to obtain the ratio S _l of the reference image and the target image after learning by the VGG network, and the input image feature matrix is multiplied by this ratio to obtain the light and shadow quotient. The formula is as follows:

F _l [M]=F _l [I]×S _l

Fl[M] is the light and shadow quotient of the image M at the _lth convolutional layer, where the constraint improvement on Fl[M] is as follows:

Wherein, the constraint value r _ij is in the range of 0.4 to 5;

(5.5) Obtain the result image through the transfer loss function: the content loss and style loss of the image in the Neural Style method are used to represent the transfer loss of the convolutional neural network to learn the content and style of the image. The formula is:

where l denotes that at the lth convolutional layer, there are N _l filters of vectorized feature maps of size D _l , D _l is the number of elements in the filter response, F _l [.]∈R ^{Nl× Dl} is the obtained feature matrix, (i, j) is the index of the feature matrix, F _l [I], F _l [E], F _l [O] are the feature matrix of the input image I and the reference painting style image E respectively. The feature matrix and the feature matrix of the resulting image O of the expected output of the lth convolutional layer; L is the total number of layers in the inspected network, where α _l , β _l are the configured weight parameters; Γ is the input image content The weight of the reconciliation between the integrity of the painting style and the transfer amount of painting style;

The transfer loss function of the light transfer method is:

F _l [O] is the expected result image, F _l [M] is the light and shadow quotient, calculate the distance between the two light and shadow quotients and continuously narrow the gap between the two to obtain natural light migration;

(5.6) Return the result image;

(5.7) END.