CN108629747B

CN108629747B - Image enhancement method and device, electronic equipment and storage medium

Info

Publication number: CN108629747B
Application number: CN201810377408.9A
Authority: CN
Inventors: 王瑞星; 沈小勇; 贾佳亚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2019-12-10
Anticipated expiration: 2038-04-25
Also published as: CN108629747A

Abstract

The embodiment of the application discloses an image enhancement method, an image enhancement device, image enhancement equipment and a storage medium, and belongs to the field of image processing. The method comprises the following steps: acquiring a target image to be processed; performing semantic segmentation on the target image to obtain n semantic regions of the target image, wherein n is an integer greater than 1; carrying out style transfer on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors to obtain an enhanced target image; wherein the influence factor is used for expressing the degree of enhancement of the semantic region. According to the method and the device, style transfer of different degrees can be respectively carried out on each semantic area in the target image, so that the problem of poor enhancement effect when the overall style transfer is carried out on the target image is solved; the method and the device achieve the enhancement of different semantic regions according to proper degrees, so that the whole target image can obtain better enhancement effect.

Description

Image enhancement method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the field of machine learning, in particular to an image enhancement method and device, electronic equipment and a storage medium.

background

When taking a picture at night, a photographer is required to have many photography skills and a high-quality camera. At present, most of smart phones have limitations on the size, hardware parameters and operation mode of the device, so that professional night-scene photos are difficult to take.

the related art provides an image enhancement method based on style migration, which comprises the following steps: the method comprises the steps of inputting a night scene image in an image enhancement program, manually selecting a style template image in the image enhancement program by a user, calculating style loss and content loss between the input night scene image and the style template image by the image enhancement program based on a deep learning method, and outputting the night scene image after a plurality of iterative operations. The output night scene image is subjected to integral stylized migration visually according to the style template image, so that the night scene image is enhanced.

Due to the large variation in image content of the night view image, there may be a very complex structure. For example, many buildings look very similar during the day, but at night they may have completely different windows and wall lights. Let alone many other different types of objects and light, such as roads, sky, cars, etc. The image enhancement method in the related art is not adaptable to night scene image enhancement of different contents.

Disclosure of Invention

The embodiment of the application provides an image enhancement method, an image enhancement device, image enhancement equipment and a storage medium, and can solve the problems that an image enhancement method in the related art cannot adapt to night scene image enhancement of different contents, and the enhancement effect is poor in some scenes. The technical scheme is as follows:

According to an aspect of the present application, there is provided an image enhancement method, the method including:

Acquiring a target image to be processed;

Performing semantic segmentation on the target image to obtain n semantic regions of the target image, wherein n is an integer greater than 1;

carrying out style transfer on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors to obtain an enhanced target image;

wherein the influence factor is used for expressing the degree of enhancement of the semantic region.

According to another aspect of the present application, there is provided an image enhancement apparatus, the apparatus including:

The image acquisition module is used for acquiring a target image to be processed;

The semantic segmentation module is used for performing semantic segmentation on the target image to obtain n semantic regions of the target image, wherein n is an integer greater than 1;

The style transfer module is used for carrying out style transfer on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors to obtain an enhanced target image;

According to another aspect of the present application, there is provided an electronic device comprising a memory and a processor; the memory stores at least one instruction that is loaded and executed by the processor to implement the image enhancement method as described above.

According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to implement the image enhancement method as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

Obtaining n semantic regions of the target image by performing semantic segmentation on the target image, and performing style transfer on the n semantic regions of the target image according to the reference image and respective corresponding influence factors to obtain an enhanced target image; the method can realize style transfer of different degrees of each semantic area in the target image, thereby solving the problem of poor enhancement effect caused by incapability of adapting to different image contents when the overall style transfer is carried out on the target image; the method and the device achieve the enhancement of different semantic regions according to proper degrees, so that the whole target image can obtain better enhancement effect.

Drawings

in order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

fig. 1 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method of image enhancement provided by an exemplary embodiment of the present application;

FIG. 3 is a flowchart of a method of image enhancement provided by an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an image enhancement method provided by another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of an image enhancement method provided by another exemplary embodiment of the present application;

FIG. 6 is a flow chart of training for retrieving neural networks provided by another exemplary embodiment of the present application;

FIG. 7 is a block diagram of an image enhancement apparatus provided in an exemplary embodiment of the present application;

Fig. 8 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

The embodiment of the application provides an image enhancement method for enhancing images at different degrees based on semantic regions, and the image enhancement method can be used for enhancing the effect of night scene images. The image enhancement method is a fully automatic image enhancement method, and therefore can be applied to mobile terminals. After a user shoots a night scene image on the mobile terminal, the processor or the AI chip in the mobile terminal performs full-automatic enhancement on the night scene image.

Typically, the image enhancement method can be applied to the following product scenarios:

Night scene enhancement function on mobile phone photographing software:

The current smart phones are all provided with a photographing function. Although the parameters of the camera on the smart phone are more and more excellent, the parameters are limited by the size of the camera on the smart phone, and particularly, the aperture cannot be very large, so that the quality of a night view image directly shot by the smart phone is relatively general. The image enhancement method provided by the embodiment of the application can be arranged in a mobile phone in a software form (or matched with a special Ai chip) to provide a full-automatic night scene enhancement function or a night scene enhancement function started manually by a user. For example, after a user takes a night view image, an enhancement button is provided on a viewing page of the night view image, and when the user presses the enhancement button, the night view image is fully automatically enhanced.

the online night scene enhancement function provided by the web server is as follows:

The image enhancement method provided by the embodiment of the application can also be arranged in a webpage server in a software form (or matched with a special Ai chip). The web server is used for providing a web page with an enhanced image function to a user. For example, the user uploads the image to be processed to the web server through the web page, selects a night scene enhancement function, and the web server performs night scene enhancement on the image.

it should be noted that the image enhancement method provided by the embodiment of the present application may be used for different forms of image enhancement, such as landscape image enhancement, human image enhancement, and the like. The embodiment of the present application mainly illustrates night-scene image enhancement, but does not limit the application scenario of the image enhancement method.

referring to fig. 1, a block diagram of an image processing apparatus according to an exemplary embodiment of the present application is shown. The image processing apparatus, which may be simply referred to as an apparatus or an electronic apparatus, may be implemented as a mobile terminal or a server. The image processing apparatus includes: a processor 120, a memory 140, and a camera 160.

Processor 120 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 120 is configured to execute at least one of instructions, code segments and programs stored in the memory 140.

The processor 120 is electrically connected to the memory 140. Optionally, the processor 120 is connected to the memory 140 via a bus. Memory 140 stores one or more instructions, codes, code segments, and/or programs. The instructions, code segments and/or programs, when executed by the processor 120, are for implementing a human pose prediction method as provided in the following embodiments.

the processor 120 is also electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 via a bus. The camera 160 is a sensing device with image capture capability. The camera 160 may also be referred to by other names such as a camera, a light sensing device, and the like. The camera 160 has the capability to capture images continuously or multiple times. Optionally, the camera 160 is provided inside the device or outside the device. In some embodiments, the camera 160 is an optional component if the target image is an image captured by another device.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the image processing apparatus, and may include more or fewer components than those shown, or combine some components, or adopt a different arrangement of components.

referring to fig. 2, a flowchart of an image enhancement method according to an exemplary embodiment of the present application is shown. The method can be used in a mobile terminal, which may be a mobile terminal with a photographing function. The method comprises the following steps:

Step 202, acquiring a target image to be processed;

The target image to be processed is an image that needs enhancement processing. Optionally, the target image is a night view image.

step 204, performing semantic segmentation on the target image to obtain n semantic regions of the target image;

The semantic region is a region marked out according to different semantic region labels in the target image. Optionally, the semantic region is a region marked out at a pixel point level. Categories of semantic region tags include, but are not limited to: sky, river, sea, lake, building, character, animal, plant, road, automobile, etc. Wherein n is a positive integer.

and step 206, performing style transfer on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors to obtain the enhanced target image.

Style Transfer (style Transfer) is an image processing technique that "transfers" the style of a reference image onto a target image. Optionally, the reference image is a representative night scene image, or the reference image is an image with a night scene effect higher than a preset condition. There are at least two reference images having different styles and/or different semantic content.

And calling a style migration enhancement network by the terminal, and carrying out style transfer on the n semantic regions of the target image according to the respective corresponding influence factors by taking the style of the reference image as a reference to obtain the enhanced target image. When n is more than or equal to 2, the influence factors of at least two semantic regions in the n semantic regions are different.

in summary, in the image enhancement method provided in this embodiment, n semantic regions of the target image are obtained by performing semantic segmentation on the target image, and style transfer is performed on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors, so as to obtain an enhanced target image; the method can realize style transfer of different degrees of each semantic area in the target image, thereby solving the problem of poor enhancement effect caused by incapability of adapting to different image contents when the overall style transfer is carried out on the target image; the method and the device achieve the enhancement of different semantic regions according to proper degrees, so that the whole target image can obtain better enhancement effect.

In an alternative embodiment provided based on fig. 2, three types of neural networks are adopted in the embodiment of the present application: a retrieval neural network, a semantic segmentation neural network and a style transfer enhancement network. Wherein:

And (3) searching the neural network: the method is used for fully automatically selecting a reference image matched with a target image from a reference image library. Optionally, the reference image satisfies the following three conditions:

1. the reference image itself has high quality content and color;

2. The semantic information of the reference image is similar to that of the target image as much as possible;

3. The hue (style) of the reference image is as similar as possible to the hue of the target image.

Semantic segmentation of neural networks: and the semantic segmentation module is used for performing semantic segmentation on the target image to obtain each semantic area on the target image. Meanwhile, if the semantic segmentation is not performed on the reference image, the semantic segmentation neural network can also be used for performing semantic segmentation on the reference image to obtain each semantic area on the reference image.

style transfer enhancement network: and the style transfer is carried out on the target image according to the semantic area as a unit according to the reference image. Optionally, each semantic region is differentially enhanced according to respective influence factors to achieve the best enhancement effect.

referring to fig. 3, a flowchart of an image enhancement method according to another exemplary embodiment of the present application is shown. When the method is used for enhancing the night scene image, the method can also be called a night scene image enhancement method. The method can be used in an electronic device, which may be an electronic device with a photographing function. The method comprises the following steps:

Step 301, acquiring a target image to be processed;

Taking the mobile terminal as an example, after the mobile terminal collects a night view image through a built-in camera, the night view image is determined as a target image to be processed.

the target image to be processed is an image that needs enhancement processing. Optionally, the target image is a night view image. The target image is an image that needs to be subjected to night scene enhancement processing.

step 302, invoking a retrieval neural network to determine a reference image matched with a target image from a plurality of reference images;

The reference image may be manually selected by the user among the plurality of reference images. But manual selection is inefficient. In an alternative embodiment, a retrieval neural network is operated in the device, and the retrieval neural network can automatically select a reference image matched with the target image from a plurality of reference images without a manual selection process of a user.

Optionally, the retrieval neural network is configured to rank the similarity of the reference images according to the semantic similarity and/or the style similarity, rank the first similarity of the reference images, and determine the reference image matching the target image. Illustratively, the device stores a plurality of reference images, each corresponding to a different semantic content and style type. In an optional embodiment, 5000 images with different semantic contents and style types and high night scene effect are selected from the image aggregation website in advance and are constructed into a reference image library.

Optionally, the device inputs the target image into a retrieval neural network, the retrieval neural network is used for calculating semantic similarity and style similarity between each reference image and the target image, and selects the reference image with the highest semantic similarity and style similarity as the reference image matched with the target image.

referring to fig. 4, assuming that the lower left image is the target image and the upper left image is the selected reference image, the selection process is as shown in S1.

step 303, calling a semantic segmentation neural network to process the target image to obtain n semantic regions of the target image;

Optionally, the semantic segmentation neural network is obtained by training a sample image labeled with a semantic region label. In an illustrative example, the semantically segmented neural network is pre-trained with the ADE20K data set.

The device inputs the target image into a semantic segmentation neural network to obtain n semantic regions of the target image, wherein n is a positive integer. Optionally, n is an integer greater than 1. Referring to fig. 4, the semantic segmentation process may be as shown at S2.

Step 304, inputting the reference image and the target image into a style migration enhancement network to obtain an output image;

After the neural network is searched to determine a reference image matched with the target image, the device inputs the reference image and the target image into a style migration enhancement network to obtain an output image, wherein the style migration enhancement network comprises iota convolution layers, and iota is a positive integer.

optionally, if the reference image is an image subjected to semantic segmentation, the device inputs k semantic regions of the reference image and n semantic regions of the target image into the style migration enhancement network to obtain an output image.

optionally, if the reference image is an image which is not subjected to semantic segmentation, the device invokes a semantic segmentation neural network to perform semantic segmentation on the reference image to obtain k semantic regions on the reference image. And then the device inputs the k semantic regions of the reference image and the n semantic regions of the target image into a style migration enhancement network to obtain an output image. Referring to fig. 4, the semantic segmentation process may be as shown at S3. In general, k ≧ n.

And then, the equipment calls a style migration enhancement network to enhance the same or similar semantic regions in the target image to different degrees according to respective influence factors according to the k semantic regions in the reference image. The style migration enhancement network may include iota convolutional layers. In the illustrative example shown in fig. 5, the style migration enhancement network 50 includes 5 convolutional layers.

In the enhancement process of the style migration enhancement network, measuring the loss between the output image and the reference image comprises: style loss and content loss. Optionally, the present application also introduces a laplacian loss term, which is used to characterize the distortion degree of the output image.

In this embodiment, the total loss between the output image and the reference image includes: loss of style L_scontent loss L_cAnd the Laplace loss term L_mfor illustration.

step 305, calculating an adaptive impact factor style loss L between the output image and the reference image_s；

The adaptive influence factor style loss is used for representing the sum of style losses of n semantic regions in the output image after style transfer according to the corresponding influence factors. Optionally, the impact factor is a default value in an initial state, and is adaptively updated before each iteration update, without manual adjustment by a user.

Optionally, after obtaining the output image in one iteration, the device calculates an adaptive impact factor style loss L between the output image and the target image according to the following formula_s：

Where O is the output image, R is the reference image, R is the label of the semantic region, and Λ is the respective impact factorthe vector of the composition is then calculated,Is the influence factor of the r semantic area on the iota convolutional layer,Is the goldmann matrix of the r-th semantic region on the iota-th convolutional layer, N^ιIs the number of feature maps on the iota-th convolutional layer.

Step 306, calculating the content loss L between the output image and the target image_c；

alternatively, the apparatus calculates the content loss L between the output image and the target image as follows_c：

where O is the output image, L is the target image, α^ιIs the content weight of each convolution layer, is the label of the semantic region, N^ιis the number of characteristic diagrams on the iota-th convolutional layer, D^ιIs the dimension of the vectorized feature map,Is a feature map corresponding to the r semantic region on the iota convolutional layer.

Optionally, to maintain the semantic structure while avoiding increasing the signal-to-noise ratio, the content weight 5 may be set to a^1-50,0,0,1,0, so that the content is lost L_cIt is only applied to convolutional layer conv4_ 2. Thus, important semantic information encoded in higher layers (e.g., conv4_2) can be preserved, while high frequency features caused by noise in lower layers are ignored and only high quality reference pictures will affect these lower layers. Finally, the night scene enhancement of the target image is realized on the premise of not increasing the noise as a side effect.

Step 307, calculating a Laplace loss term L of the output image_m；

Optionally, to realize a more realistic style conversion, the device calculates the laplacian loss term L of the output image according to the following formula_mThe Laplace loss term L_mTo preserve details in the target image:

Where O is the output image, V_c[.]is the vector C channel of the output image, which is a linear system defined on the laplacian matte matrix.

Step 308, inputting the characteristics of the output image and the reference image into the full connection layer to obtain an iota m parameter matrix;

optionally, a fully-connected layer is further provided in the embodiments of the present application, and the fully-connected layer is used to adaptively generate enhancement factors of each semantic region class of different convolutional layers, as shown in a fully-connected layer 52 in fig. 5.

That is, the influence factors in the embodiment of the present application are adaptively updated. After obtaining the output image in one iteration, the device also inputs the characteristics of the output image and the reference image into the full connection layer, and the full connection layer outputs the parameter matrix of iota m. The parameter matrix of iota x m corresponds to m semantic regions on iota convolutional layers one by one.

step 309, determining an enhancement factor of the mth semantic region on the ith convolution layer according to the parameter matrix of iota x m;

the device determines an enhancement factor for the mth semantic region on the ith convolutional layer from the parameter matrix of iota m. In the schematic view of the above, the first embodiment of the invention,

Λ＝softmax[W H[O,R]]；

h [ O, R ] is the concatenation (concat) result of the feature vectors of the output image and the reference image converged by the last layer of convolution layer, and W is the parameter matrix of the optimized full-link layer.

Illustratively, an adaptive update of the respective impact factors is achieved through steps 308 and 308 during each iteration.

step 310, output image is arranged in iota convolution layers according to total lossperforming iterative enhancement, wherein the total loss comprises an adaptive influence factor style loss L_sContent loss L_cLaplace loss term L_m；

Optionally, the device calculates the total loss between the output image and the reference image as follows:

L(O，Λ)＝L_c(O)+βL_s(O,Λ)+γL_m(O)

Where β is the weight of the adaptive impact factor style loss (total style loss), and γ is the weight of the laplacian loss term.

Optionally, the device employs an alternating update scheme to optimize the total loss, i.e. to optimize the output image by calculating the error back-propagation of the total loss. For the enhancement factor update, the weight gradient of the fully connected layer is as follows:

wherein L is_sIs the adaptive impact factor style loss (total style loss), C is the number of categories of the semantic regions, i is the ith layer in the plurality of convolutional layers, j is the jth category in the plurality of semantic region categories, k is the kth element in the feature vector of the fully-connected layer, C_i+jis a row in the parameter matrix of the fully-connected layer, k is also a column in the parameter matrix of the fully-connected layer.

When the total loss converges, the output image is determined as the enhanced target image, step 311.

In summary, in the method provided in this embodiment, n semantic regions of the target image are obtained by performing semantic segmentation on the target image, and style transfer is performed on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors, so as to obtain an enhanced target image; the method can realize style transfer of different degrees of each semantic area in the target image, thereby solving the problem of poor enhancement effect caused by incapability of adapting to different image contents when the overall style transfer is carried out on the target image; the method and the device achieve the enhancement of different semantic regions according to proper degrees, so that the whole target image can obtain better enhancement effect.

the method provided by the embodiment further obtains the total style loss by respectively calculating the style losses of each semantic area and then accumulating, and can accurately calculate the style losses, thereby achieving a better style transfer effect.

according to the method provided by the embodiment, the Laplace loss item is introduced, so that the image details of the target image can be reserved in the style transfer process, and relatively real style transfer is realized.

In the method provided by the embodiment, the full-link layer is input by using the image characteristics of the output image and the reference image, and the full-link layer is used for adaptively generating the influence factor, so that the low efficiency of manually adjusting the influence factor by a user is avoided, and the full-automatic image enhancement is realized.

the method provided by the embodiment further utilizes the retrieval neural network to automatically pick out the reference image matched with the target image from the reference image library, so that the reference image with the most matched semantic similarity or style similarity is fully automatically picked out, and the full-automatic image enhancement is realized.

In an alternative embodiment based on fig. 3, the search neural network is trained in advance as follows, as shown in fig. 6:

Step 601, obtaining a plurality of groups of training samples, wherein the training samples comprise: a sample target image, a positively correlated reference image and a negatively correlated reference image;

optionally, the training sample is obtained as follows:

1. For any sample target image, calculating semantic measurement between the sample target image and each reference image in the reference image library;

2. Selecting the first i candidate reference images according to the arrangement sequence of the semantic metrics from big to small;

Optionally, for two images I₁and I₂the semantic metric D_semeuclidean distances defined as features in the FC-8 layer of a pre-trained VGG-16 classification network are characterized by the following formula:

D_sem＝||f_fc8(I₁)-f_fc8(I₂)||₂；

Wherein f is_fc8(.) identify features in the image at the FC-8 layer. Illustratively, for any one sample target image, 60 candidate reference images are selected.

3. And marking j positive sample reference images and k negative correlation reference images corresponding to the sample target image in the first i candidate reference images to form a group of training samples.

Taking 60 candidate reference images as an example, a manual calibration (or machine calibration) mode is adopted to calibrate 10 positive sample reference images and 50 negative correlation reference images corresponding to the sample target images in the first 60 candidate reference images, so as to form a group of training samples.

step 602, inputting the training sample into the search neural network, and performing error back propagation training according to the loss function.

the loss function is the following function L_ref：

Wherein i is the number of candidate reference images of the sample target image, j is the number of positively correlated reference images of the sample target image, k is the number of negatively correlated reference images of the sample target image, p represents a positively correlated reference image, n represents a negatively correlated reference image, and α represents the minimum distance between the positively correlated reference image and the negatively correlated reference image.

In summary, the method provided by this embodiment can train a search neural network for matching a target image with a reference image by using a triplet loss function, so that the input target image is as close as possible to a positively correlated reference image and is far from a negatively correlated reference image.

Fig. 7 shows a block diagram of an image enhancement apparatus according to an exemplary embodiment of the present application. The image enhancement means may be implemented as all or part of the image processing apparatus by software, hardware or a combination of both. The device includes: an image acquisition module 720, a semantic segmentation module 740, and a lattice transfer module 760.

an image obtaining module 720, configured to obtain a target image to be processed;

A semantic segmentation module 740, configured to perform semantic segmentation on the target image to obtain n semantic regions of the target image, where n is an integer greater than 1;

The style transfer module 760 is configured to perform style transfer on the n semantic regions of the target image according to the reference image and the respective corresponding influence factors to obtain an enhanced target image;

In an alternative embodiment, the style transfer module 760 is configured to:

inputting the reference image and the target image into a style migration enhancement network to obtain an output image, wherein the style migration enhancement network comprises iota convolution layers, and iota is a positive integer;

calculating an adaptive impact factor style loss L between the output image and the reference image_sthe adaptive influence factor style loss is used for the sum of style losses after style transfer is carried out on n semantic regions in the output image according to the corresponding influence factors;

Calculating a content loss L between the output image and the target image_c；

iteratively enhancing the output image in the iota convolutional layers according to a total loss, the total loss comprising the adaptive impact factor style loss L_sAnd said content loss L_c；

Determining the output image as the enhanced target image when the total loss converges.

In an alternative embodiment, the style transfer module 760 is further configured to calculate the adaptive impact factor style loss L between the output image and the target image according to the following formula_s：

in an alternative embodiment, the style transition module 760 is further configured to calculate the content loss L between the output image and the target image according to the following formula_c：

Where O is the output image, L is the target image, α^ιIs the content weight of each convolution layer, r is the label of the semantic region, N^ιIs the number of characteristic diagrams on the iota-th convolutional layer, D^ιIs the dimension of the vectorized feature map,Is a feature map corresponding to the r semantic region on the iota convolutional layer.

In an alternative embodiment, the total loss further comprises: a Laplace matrix loss term for ensuring the reality of the output image; the style transition module 760 is further configured to calculate a laplacian loss term of the output image according to the following formula:

Where O is the output image, V_c[·]is the vector C channel of the output image, M_LIs a linear system defined on the laplacian matte matrix.

in an optional embodiment, the style transfer module 760 is further configured to input features of the output image and the reference image into a fully-connected layer to obtain an iota × m parameter matrix, where the fully-connected layer is configured to adaptively generate enhancement factors for respective semantic region classes of different convolutional layers; and determining an enhancement factor of the m semantic region on the iota convolution layer according to the iota-m parameter matrix.

in an optional embodiment, the semantic segmentation module 74 is further configured to invoke a semantic segmentation neural network to perform semantic segmentation on the reference image, so as to obtain k semantic regions on the reference image;

The style transfer module 760 is further configured to input the k semantic regions on the reference image and the n semantic regions on the target image into the style migration enhancement network to obtain the output image.

in an optional embodiment, the apparatus further comprises: a retrieval module 780;

The retrieving module 780 is configured to invoke a retrieving neural network to determine a reference image matched with the target image from a plurality of reference images, and the retrieving neural network is configured to perform similarity ranking on the reference images according to semantic similarity and/or style similarity.

In an optional embodiment, the apparatus further comprises:

The image obtaining module 72 is configured to obtain a plurality of sets of training samples, where the training samples include: a sample target image, a positively correlated reference image and a negatively correlated reference image;

And the training module 790 is configured to input the plurality of sets of training samples into the search neural network, and perform error back propagation training according to a loss function.

In an alternative embodiment, the image obtaining module 720 is configured to:

for any sample target image, calculating semantic measurement between the sample target image and each reference image in a reference image library;

Selecting the first i candidate reference images according to the arrangement sequence of the semantic metrics from big to small;

And marking j positive sample reference images and k negative correlation reference images corresponding to the sample target image in the first i candidate reference images to form a group of training samples.

In an alternative embodiment, the loss function is a function L_ref：

Wherein i is the number of candidate reference images of the sample target image, j is the number of positively correlated reference images of the sample target image, k is the number of negatively correlated reference images of the sample target image, p represents the positively correlated reference images, n represents the negatively correlated reference images, and α represents the minimum distance between the positively correlated reference images and the negatively correlated reference images.

In an optional embodiment, the semantic segmentation module 740 is configured to invoke a semantic segmentation neural network to process the target image, so as to obtain n semantic regions of the target image;

The semantic segmentation neural network is obtained by training a sample image marked with a semantic area label.

It should be noted that: in the image enhancement device provided in the above embodiment, when enhancing a night view image, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the embodiments of the image enhancement apparatus and the method of the image enhancement method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the methods, which are not described herein again.

Referring to fig. 8, a block diagram of an electronic device 800 according to an exemplary embodiment of the invention is shown. The electronic device 800 may be: smart phones, tablet computers, MP3 players (Moving Picture Experts group Audio Layer III, motion video Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts compression standard Audio Layer IV, motion video Experts compression standard Audio Layer 4), notebook computers, or desktop computers. Electronic device 800 may also be referred to by other names as user equipment, portable electronic device, laptop electronic device, desktop electronic device, and so on.

In general, the electronic device 800 includes: a processor 801 and a memory 802.

The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the image enhancement methods provided by method embodiments herein.

In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a touch screen display 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

the Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 804 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, providing the front panel of the electronic device 800; in other embodiments, the number of the display screens 805 may be at least two, and the at least two display screens are respectively disposed on different surfaces of the electronic device 800 or are in a folding design; in still other embodiments, the display 805 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of an electronic apparatus, and a rear camera is disposed on a rear surface of the electronic apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

the audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the electronic device 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The positioning component 808 is configured to locate a current geographic Location of the electronic device 800 to implement navigation or LBS (Location Based Service). The positioning component 808 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.

the power supply 809 is used to power the various components in the electronic device 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the electronic device 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user on the electronic device 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of electronic device 800 and/or underneath touch display 805. When the pressure sensor 813 is disposed on the side frame of the electronic device 800, the holding signal of the user to the electronic device 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of electronic device 800. When a physical button or vendor Logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical button or vendor Logo.

the optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically disposed on the front panel of the electronic device 800. The proximity sensor 816 is used to capture the distance between the user and the front of the electronic device 800. In one embodiment, the processor 801 controls the touch display 805 to switch from the bright screen state to the dark screen state when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 is gradually decreased; when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 becomes gradually larger, the processor 801 controls the touch display 805 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 8 does not constitute a limitation of electronic device 800, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

The present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the image enhancement method provided by the above-mentioned method embodiment.

The present application further provides a computer program product, which, when run on an electronic device, causes the electronic device to perform the image enhancement method described in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

the above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of image enhancement, the method comprising:

acquiring a target image to be processed;

inputting a reference image and the target image into a style migration enhancement network to obtain an output image, wherein the style migration enhancement network comprises iota convolution layers, and iota is a positive integer;

calculating a content loss L between the output image and the target image_c；

Determining the output image as an enhanced target image when the total loss converges;

2. The method of claim 1, wherein the computing of the adaptive impact factor style loss L between the output image and the reference image_sThe method comprises the following steps:

Calculating the adaptive impact factor style loss L between the output image and the target image according to the following formula_s：

3. the method of claim 1, wherein the calculating a content loss L between the output image and the target image_cThe method comprises the following steps:

Calculating the content loss L between the output image and the target image according to the following formula_c：

Whereino is the output image, L is the target image, α^ιIs the content weight of each convolution layer, r is the label of the semantic region, N^ιis the number of characteristic diagrams on the iota-th convolutional layer, D^ιIs the dimension of the vectorized feature map,is a feature map corresponding to the r semantic region on the iota convolutional layer.

4. the method of claim 1, wherein the total loss further comprises: a Laplace matrix loss term for ensuring the reality of the output image; the method further comprises the following steps:

calculating a laplacian loss term for the output image according to the following formula:

where O is the output image, V_c[.]Is the vector C channel of the output image, M_Lis a linear system defined on the laplacian matte matrix.

5. the method of claim 2, further comprising:

Inputting the characteristics of the output image and the reference image into a full connection layer to obtain an iota m parameter matrix, wherein the full connection layer is used for adaptively generating enhancement factors of various semantic region types of different convolution layers;

And determining an enhancement factor of the m semantic region on the iota convolution layer according to the iota-m parameter matrix.

6. the method of claim 1, wherein the inputting the reference image and the target image into a style migration enhancement network to obtain an output image comprises:

Calling a semantic segmentation neural network to perform semantic segmentation on the reference image to obtain k semantic regions on the reference image;

And inputting the k semantic regions on the reference image and the n semantic regions on the target image into the style migration enhancement network to obtain the output image.

7. The method according to any one of claims 1 to 6, wherein before the step of inputting the reference image and the target image into the style migration enhancement network to obtain the output image, the method further comprises:

And calling a retrieval neural network to determine a reference image matched with the target image from a plurality of reference images, wherein the retrieval neural network is used for carrying out similarity ranking on the reference images according to semantic similarity and/or style similarity.

8. The method of claim 7, wherein invoking the search neural network further comprises, prior to determining a reference image from the plurality of candidate reference images that matches the target image:

obtaining a plurality of sets of training samples, the training samples comprising: a sample target image, a positively correlated reference image and a negatively correlated reference image;

And inputting the multiple groups of training samples into the retrieval neural network, and carrying out error back propagation training according to a loss function.

9. The method of claim 8, wherein obtaining the plurality of sets of training samples comprises:

10. Method according to claim 8 or 9, characterized in that the loss function is the following function L_ref：

11. The method according to any one of claims 1 to 6, wherein the semantic segmentation of the target image to obtain n semantic regions of the target image comprises:

calling a semantic segmentation neural network to process the target image to obtain n semantic regions of the target image;

12. an image enhancement apparatus, characterized in that the apparatus comprises:

The style transfer module is used for inputting a reference image and the target image into a style migration enhancement network to obtain an output image, wherein the style migration enhancement network comprises iota convolution layers, and iota is a positive integer; computing an adaptive shadow between the output image and the reference imageNoise factor style loss L_sThe adaptive influence factor style loss is used for the sum of style losses after style transfer is carried out on n semantic regions in the output image according to the corresponding influence factors; calculating a content loss L between the output image and the target image_c(ii) a Iteratively enhancing the output image in the iota convolutional layers according to a total loss, the total loss comprising the adaptive impact factor style loss L_sand said content loss L_c(ii) a Determining the output image as an enhanced target image when the total loss converges;

13. An electronic device, comprising a memory and a processor;

The memory has stored therein at least one instruction that is loaded and executed by the processor to implement the image enhancement method of any of claims 1 to 11.

14. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the image enhancement method of any one of claims 1 to 11.