Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present description refers to any or all possible combinations including one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
First, terms related to one or more embodiments of the present specification will be explained.
Image synthesis (IMAGE SYNTHESIS) is to edit the coding vector corresponding to the human face based on the neural network.
Neural radiation field (Neural RADIANCE FIELD) is a unique structure of volume rendering (3D rendering).
The semantic face segmentation (Semantic Segmentation) is to separate the face according to the hair, skin, background and the like.
Multi-view consistency (Multi-View Consistent) ensures that the face is of the same structure at any view, i.e. identity preservation (ID-Preservation) is excellent.
Identity maintenance (ID-Preservation), the ability of a person to not change from one viewing angle to another.
In this specification, an image processing method based on a neural radiation field is provided. One or more embodiments of the present specification relate to a face generating method based on a neural radiation field, an image processing apparatus based on a neural radiation field, a face generating apparatus based on a neural radiation field, a computing device, a computer-readable storage medium, and a computer program, which are described in detail in the following embodiments one by one.
Fig. 1 shows a flowchart of a method for processing an image based on a neural radiation field according to an embodiment of the present disclosure, including steps 102 to 108.
Step 102, determining sampling position information, sampling visual angle information and a target object image in response to the image processing instruction.
The image processing instruction specifically refers to an instruction for generating a three-dimensional virtual object, and the image processing method based on the nerve radiation field provided by the specification adopts a nerve radiation field model, so that the geometric property of a nerve network to a 3D object is ensured, and the superiority and consistency of the effect of editing and synthesizing the virtual object are ensured.
The input in using the neural radiation field model is simply a picture at different perspectives, without the need for camera parameters, ambient light, and numerous parts involved in the 3D Mesh (3D Mesh) based rendering process, texture, illumination, texture, camera position, etc.
The sampling position information and sampling view angle information are input parameters for inputting to a neural radiation field model, the neural radiation field (Neural RADIANCE FIELDS, NERF) is a model for rendering to generate a three-dimensional scene, the conventional NeRF is a three-dimensional scene representation, and is an implicit scene representation, which cannot directly see a three-dimensional model, so that the NeRF is somewhat inconvenient to use in practical applications. According to the image processing method based on the nerve radiation field, the nerve radiation field model is adopted to ensure the geometric properties of the nerve network to the 3D object, and meanwhile, the virtual object is edited through synthesis of display.
NeRF is an abbreviation for Neural RADIANCE FIELDS. Wherein RADIANCE FIELDS refers to a function, or map g θ, and in particular NeRF can be represented by the following equation 1:
(σ, c) =g θ (x, d) equation 1
Wherein the inputs to the mapping are x and d, where x ε R 3 is the coordinates of the three-dimensional spatial point, i.e., the sample position information, and d ε S 2 is the viewing angle, i.e., the sample perspective information. The output of the mapping is σ and c, where σ ε R + is the density information and d ε S 3 is the RGB color.
Optionally, determining the sampling position information and the sampling view angle information includes:
determining sampling position information on a preset hemispherical surface;
And determining corresponding sampling visual angle information based on the sampling position information.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating acquisition of sampling position information and sampling viewing angle information according to an embodiment of the present disclosure, where the preset hemisphere is specifically a virtual hemisphere for covering the target object image in NeRF, one sampling position information is randomly determined in the virtual hemisphere, that is, x in the above formula 1, and then a viewing direction is constructed based on the sampling position information, that is, d in the above formula.
The target object image specifically refers to an image of a three-dimensional object that is intended to be rendered to be generated in the embodiment of the present specification, in which the target object image is a picture of the same scene taken from a different position.
In a specific embodiment provided in the present specification, taking an image in which the target object image is a vase as an example, the sampling position information x, the sampling viewing angle information d, and the target object image P are determined in response to an image processing instruction.
And 104, inputting the sampling position information, the sampling visual angle information and the target object image into a pre-trained nerve radiation field model for processing, and obtaining density information, color characteristic values and symbol distance function values corresponding to each sampling point output by the nerve radiation field model.
Wherein the neural radiation field model is a machine learning model. In the method provided by the specification, the nerve radiation field model is different from the conventional nerve radiation field model, and on the basis that the conventional nerve radiation field model outputs sampling point density information sigma and color characteristic value c, a sign distance function value (SDF value), a sign distance function (SIGN DISTANCE function), called an SDF for short, and also called a directional distance function (oriented distance function) is also output, a distance from a point to a boundary of the region is determined on a limited region in space, and meanwhile, the sign of the distance is defined, wherein the point is positive inside the boundary of the region, the outside is negative, and the point is 0 when the point is positioned on the boundary. From the SDF value corresponding to each sampling point, it can be determined to which part of the virtual object each sampling point belongs. Referring to fig. 3, fig. 3 shows an SDF value analysis schematic diagram of a virtual face according to an embodiment of the present disclosure, where, as shown in fig. 3, the color of the upper left corner area is a face area, the color of the lower left corner area is a hair area, the color of the lower right corner area is a background area, and the color of the upper right corner area is undefined.
In the method provided in the present specification, the neural radiation field model outputs an SDF value in addition to density information and color characteristic value information of the conventional neural radiation field model, and based on the SDF value, a region corresponding to each sampling point can be determined. And the virtual objects are conveniently and respectively rendered after being split according to the SDF value of each sampling point, and then are combined.
Specifically, inputting the sampling position information, the sampling visual angle information and the target object image into a pre-trained neural radiation field model for processing, including:
the sampling position information, the sampling visual angle information and the target object image are input into a pre-trained nerve radiation field model;
The neural radiation field model transmits sampling point rays to sampling points of the target object image based on the sampling position information and the sampling perspective information;
And determining the density information, the color characteristic value and the symbol distance function value of each sampling point based on the sampling point rays corresponding to each sampling point.
In practical applications, neRF provided herein uses a ray casting technique (RAY CASTING) to emit sampling point rays from sampling location information to sampling points of the target object image based on the sampling perspective information, where the sampling points may be pixels in the target object image.
RAY CASTING is a relatively simple implementation method of volume rendering (Volumetric rendering), volume rendering refers to a rendering method that is drawn according to the projection of volume data (3D texture) on a 2D plane, light rays of RAY CASTING are directed from a camera to the 3D texture, but these are only determined directions of the light rays, and in fact, the initial position of the light rays is the front intersection point of the light rays and the volume texture, and the final position is the back intersection point of the volume texture.
In the process of obtaining the density information, the color characteristic value and the SDF value of each sampling point, a light ray aggregation technology (RAY MARCHING) is used in addition to a light ray projection technology, RAY MARCHING is a general simulation method, rays of which start along a starting point, stop for calculation at regular intervals, and then continue to move, wherein the time interval can be fixed or not fixed.
The density information, the color characteristic value and the SDF value of each sampling point can be obtained through a light projection technology and a light aggregation technology of NeRF models.
In a specific embodiment provided in the present specification, taking an image of a target object image as a vase as an example, the sampling position information x, the sampling visual angle information d and the target object image P are input into a pre-trained neural radiation field model, so as to obtain density information sigma, a color feature value c and a symbol distance function value SDF corresponding to each sampling point output by the neural radiation field model.
And 106, determining at least two sub-sets of the sampling points based on the symbol distance function value corresponding to each sampling point.
In the above steps, the SDF value of each sampling point is obtained, that is, it can be determined which sampling point subset each sampling point belongs to according to the SDF value corresponding to each sampling point, specifically, a virtual object may be composed of multiple parts, for example, a photo of a person's head may be classified into a face, a hair, a background, and for a vase, a vase body, a flower, a background, etc. From the SDF value of each sampling point, it can be determined to which portion each sampling point belongs.
Specifically, determining at least two sub-sets of sampling points based on the symbol distance function value corresponding to each sampling point includes:
Determining a preset symbol distance function value interval corresponding to each sampling point subset;
And determining a sampling point subset corresponding to each sampling point based on the symbol distance function value corresponding to each sampling point and each preset symbol distance function value interval.
The preset symbol distance function value interval specifically refers to an interval of SDF values corresponding to each sampling point in the component of the virtual object, and the preset symbol distance function value interval corresponding to each component is determined according to the SDF values of the sampling points in each component of the virtual object.
The determination of which component of the virtual object each sample point belongs to is based on the SDF value corresponding to each sample point, as shown in fig. 3. The sampling points whose SDF values are in the upper left corner region belong to the human face portion, the sampling points whose SDF values are in the lower left corner region belong to the hair portion, and the sampling points whose SDF values are in the lower right corner region belong to the background portion. That is, the image shown in fig. 3 includes 3 sub-sets of sampling points, which are respectively a face sub-set, a hair sub-set, and a background sub-set.
And 108, rendering and generating target sub-objects corresponding to each sampling point sub-set according to the density information and the color characteristic values of the sampling points in each sampling point sub-set, and generating target objects based on each target sub-object.
After the sub-sets of sampling points are divided, volume rendering (Volumetric rendering) can be performed on each sub-set of sampling points according to the density information and the color characteristic values of the sampling points in each sub-set of sampling points, so that the component parts of the virtual object corresponding to each sub-set of sampling points are obtained.
Still taking fig. 3 as an example, the sampling points in the face subset are subjected to volume rendering according to the density information and the color characteristic value to obtain the face portion, the sampling points in the hair subset are subjected to volume rendering according to the density information and the color characteristic value to obtain the hair portion, and the sampling points in the background subset are subjected to volume rendering according to the density information and the color characteristic value to obtain the background portion.
And then, after a plurality of target sub-objects are obtained, synthesizing the target sub-objects, and obtaining the target object.
In practical application, generating a target sub-object corresponding to each sampling point sub-set according to the density information and the color characteristic value rendering of the sampling points in each sampling point sub-set, including:
selecting a target sampling point subset;
And rendering and generating a target sub-object corresponding to the target sampling point sub-set based on the density information and the color characteristic value of each sampling point in the target sampling point sub-set.
In practical application, a certain sampling point subset is taken as an example for explanation, namely, one target sampling point subset is determined in a plurality of sampling point subsets, the target sub-object corresponding to the target sampling point subset is generated according to the density information and the color characteristic value of each sampling point in the target sampling point subset, the components of the virtual object can be explicitly reflected in a mode of respectively performing volume rendering, and a plurality of target sub-objects can be respectively combined when a plurality of virtual objects are combined later. And generating various combination modes, and improving the rendering to obtain the style of the virtual object. For example, taking the rendering of the generated face as an example, the face a1 and the hairstyle a2 of the generated character a may be rendered, the face B1 and the hairstyle B2 of the generated character B may be rendered, the face a1 and the hairstyle B2 may be combined to generate the character C, and the face B1 and the hairstyle a2 may be combined to generate the character D.
In a specific embodiment provided in the present specification, generating a target object based on each target sub-object includes:
generating an initial target object based on each target sub-object;
And improving the resolution of the initial target object to obtain the target object.
In practical application, after each target sub-object is obtained, the initial target object can be generated by stitching according to each target sub-object, and the resolution of the initial target object is generally lower, so that the user viewing experience is poor, and the resolution of the initial target object needs to be improved according to the requirement, namely, the resolution of the initial target object is improved, so that the target object is obtained. Specifically, the step of improving the resolution of the initial target object to obtain the target object includes:
inputting the initial target object into a resolution enhancement model;
And obtaining a target object corresponding to the initial target object output by the resolution enhancement model.
In practical applications, the initial target image may be input to an encoder (resolution enhancement model) using StyleGan for resolution enhancement, so as to obtain a high-resolution target object output by the resolution enhancement model. For example, the resolution of the initial target object is 64×64, and after the resolution boost model processing, a target object with a resolution of 512×512 can be obtained.
In one embodiment provided in the present specification, the neural radiation field model is trained by:
acquiring a sample sampling position, sample sampling visual angle information and a sample object image;
inputting the sample sampling position, the sample sampling visual angle information and the sample object image into a nerve radiation field model to obtain density information, color characteristic values and symbol distance function values corresponding to each sample sampling point output by the nerve radiation field model;
inputting the sample sampling position, the sample sampling visual angle information and the sample object image into an object generation model to obtain a sample object output by the object generation model;
Determining at least two sample sampling point subsets according to density information, color characteristic values and symbol distance function values corresponding to each sample sampling point, rendering and generating a prediction sub-object corresponding to each sample sampling point subset, and generating a prediction object based on each prediction sub-object;
image segmentation is carried out on the sample object to obtain a sample sub-object corresponding to the sample object;
calculating a first loss value according to the prediction sub-object and the sample sub-object, and calculating a second loss value according to the prediction object and the sample object;
And adjusting model parameters of the nerve radiation field model based on the first loss value and the second loss value, and continuing training until a model training stopping condition is reached.
In practical application, the neural radiation field model provided in the embodiment of the present specification is different from a conventional neural radiation field model, and the neural radiation field model provided in the embodiment of the present specification outputs a symbol distance function value in addition to density information and a color feature value.
In the model training process, sample sampling positions, sample sampling visual angle information and sample object images are input into a nerve radiation field model to be trained, and density information, color characteristic values and symbol distance function values corresponding to each sample sampling point output by the nerve radiation field model are obtained. And determining at least two sample sampling point subsets according to the density information, the color characteristic value and the symbol distance function value corresponding to each sample sampling point, rendering and generating a prediction sub-object corresponding to each sample sampling point subset, and generating a prediction object based on each prediction sub-object.
And inputting the sample sampling position, the sample sampling visual angle information and the sample object image into an object generation model to obtain a sample object output by the object generation model, wherein the object generation model is a conventional NeRF model. After the sample object is obtained, image segmentation is carried out on the sample object, and a plurality of sample sub-objects corresponding to the sample object are obtained.
And calculating a first loss value according to the predicted sub-object and the sample sub-object, calculating a second loss value according to the predicted object and the sample object, and jointly training model parameters of the nerve radiation field model to be trained according to the first loss value and the second loss value until a model training stop condition is reached. The training stopping condition of the model may be that the first loss value and the second loss value are both lower than a preset threshold value, or that the training round of training of the model reaches a preset training round.
In practical application, in the training initial stage, the difference between the generated predicted object and the real object may be larger, for example, taking the generation of a face image as an example, the difference between the face image generated by the training initial stage of the neural radiation field model to be trained and the real face may be huge, and correction is required, based on this, the method further includes:
Acquiring a reference object;
image segmentation is carried out on the reference object to obtain a reference sub-object corresponding to the reference object;
Calculating a third loss value according to the reference sub-object and the prediction sub-object;
And adjusting model parameters of the nerve radiation field model according to the third loss value.
The reference object is a real virtual object, the reference object is subjected to image segmentation, the reference object is split into a plurality of reference sub-objects, a third loss value is calculated according to the reference sub-objects and the prediction sub-objects, and model parameters of the nerve radiation field model to be trained are adjusted by combining the third loss value.
The image processing method based on the nerve radiation field comprises the steps of responding to an image processing instruction to determine sampling position information, sampling visual angle information and a target object image, inputting the sampling position information, the sampling visual angle information and the target object image into a pre-trained nerve radiation field model to be processed, obtaining density information, color characteristic values and symbol distance function values corresponding to each sampling point output by the nerve radiation field model, determining at least two sampling point subsets based on the symbol distance function values corresponding to each sampling point, rendering and generating target sub-objects corresponding to each sampling point subset according to the density information and the color characteristic values of the sampling points in each sampling point subset, and generating the target object based on each target sub-object.
An embodiment of the present disclosure achieves that density information, color feature values, and symbol distance function values (SDF values) of sampling points are obtained according to sampling position information, sampling perspective information, and a target object image by fusing symbol distance function values into a neural radiation field model (NeRF model), components of a virtual object are determined by means of an SDF value set NeRF model, the components are rendered respectively, and components generated by rendering are spliced into the target virtual object. According to the method, the geometric property of the neural network to the 3D object is guaranteed through the density information and the color characteristic values, the corresponding component part of each sampling point is determined through the SDF value, each component part of the virtual object is generated, and the superiority and consistency of the rendering effect of the virtual object are guaranteed.
And secondly, the resolution of the target object can be improved through a SytleGan resolution improvement model.
The application of the image processing method based on the neural radiation field in rendering a face is taken as an example, and the image processing method based on the neural radiation field is further described below with reference to fig. 4 a. Fig. 4a is a flowchart illustrating a process of a face generating method based on a neural radiation field according to an embodiment of the present disclosure, and specific steps include steps 402 to 408.
Step 402, determining sampling position information, sampling visual angle information and a face image in response to the face generation instruction.
And 404, inputting the sampling position information, the sampling visual angle information and the face image into a pre-trained nerve radiation field model for processing, and obtaining density information, color characteristic values and symbol distance function values corresponding to each sampling point output by the nerve radiation field model.
Wherein the neural radiation field model is a machine learning model.
Step 406, determining at least two sub-sets of sampling points based on the symbol distance function value corresponding to each sampling point.
Step 408, rendering and generating a face sub-set corresponding to each sampling point sub-set based on each sampling point sub-set, and generating a target face based on each face sub-set.
Referring to fig. 4b, fig. 4b shows a schematic view of a face subset provided in the embodiment of the present disclosure, where, as shown in fig. 4b, a left portion is a background set, a middle portion is a face set, and a right portion is a hair set.
Optionally, determining the sampling position information and the sampling view angle information includes:
determining sampling position information on a preset hemispherical surface;
And determining corresponding sampling visual angle information based on the sampling position information.
Optionally, inputting the sampling position information, the sampling perspective information and the face image into a pre-trained neural radiation field model for processing, including:
The sampling position information, the sampling visual angle information and the face image are input into a pre-trained nerve radiation field model;
The nerve radiation field model transmits sampling point rays to sampling points of the face image based on the sampling position information and the sampling visual angle information;
And determining the density information, the color characteristic value and the symbol distance function value of each sampling point based on the sampling point rays corresponding to each sampling point.
Optionally, determining at least two sub-sets of sampling points based on the symbol distance function value corresponding to each sampling point includes:
Determining a preset symbol distance function value interval corresponding to each sampling point subset;
And determining a sampling point subset corresponding to each sampling point based on the symbol distance function value corresponding to each sampling point and each preset symbol distance function value interval.
Optionally, rendering based on each sampling point subset to generate a face subset corresponding to each sampling point subset, including:
selecting a target sampling point subset;
and rendering and generating a target face sub-object corresponding to the target sampling point sub-set based on the density information and the color characteristic value of each sampling point in the target sampling point sub-set.
Optionally, generating the target face based on each face subset includes:
Generating an initial face based on each target face sub-object;
And improving the resolution of the initial face to obtain a target face.
Optionally, the step of improving the resolution of the initial face to obtain the target face includes:
inputting the initial face to a resolution enhancement model;
And obtaining a target face corresponding to the initial face output by the resolution enhancement model.
Optionally, the neural radiation field model is trained by:
Acquiring a sample sampling position, sample sampling visual angle information and a sample face image;
Inputting the sample sampling position, the sample sampling visual angle information and the sample face image into a nerve radiation field model to obtain density information, color characteristic values and symbol distance function values corresponding to each sample sampling point output by the nerve radiation field model;
Inputting the sample sampling position, the sample sampling visual angle information and the sample face image into a face generation model to obtain a sample face output by the face generation model;
determining at least two sample sampling point subsets according to density information, color characteristic values and symbol distance function values corresponding to each sample sampling point, rendering and generating predicted face sub-objects corresponding to each sample sampling point subset, and generating predicted faces based on each predicted face sub-object;
Image segmentation is carried out on the sample face to obtain a sample face sub-object corresponding to the sample face;
Calculating a first loss value according to the predicted face sub-object and the sample face sub-object, and calculating a second loss value according to the predicted face and the sample face;
And adjusting model parameters of the nerve radiation field model based on the first loss value and the second loss value, and continuing training until a model training stopping condition is reached.
Optionally, after rendering and generating the predicted face sub-object corresponding to each sample sampling point subset, the method further includes:
Acquiring a reference face;
image segmentation is carried out on the reference face to obtain a reference face sub-object corresponding to the reference face;
Calculating a third loss value according to the reference face sub-object and the predicted face sub-object;
And adjusting model parameters of the nerve radiation field model according to the third loss value.
According to the face generation method based on the nerve radiation field, the method that the symbol distance function value is fused into the nerve radiation field model (NeRF model) is achieved, the density information, the color characteristic value and the symbol distance function value (SDF value) of the sampling point are obtained according to the sampling position information, the sampling visual angle information and the target object image, each component part of the face is determined through the SDF value set NeRF model, each component part is rendered respectively, and each component part generated by rendering is spliced into a face image. According to the method, the geometric properties of the neural network to the 3D object are guaranteed through the density information and the color characteristic values, the corresponding component parts of each sampling point are determined through the SDF values, the component parts of the face image are generated, and the superiority and consistency of the virtual object rendering effect are guaranteed.
And secondly, the resolution of the target object can be improved through a SytleGan resolution improvement model.
Corresponding to the above-mentioned embodiment of the image processing method based on the neural radiation field, the present disclosure further provides an embodiment of an image processing device based on the neural radiation field, and fig. 5 shows a schematic structural diagram of an image processing device based on the neural radiation field according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:
A determining module 502 configured to determine sampling position information, sampling perspective information, and a target object image in response to an image processing instruction;
The input module 504 is configured to input the sampling position information, the sampling visual angle information and the target object image into a pre-trained neural radiation field model for processing, and obtain density information, a color characteristic value and a symbol distance function value corresponding to each sampling point output by the neural radiation field model, wherein the neural radiation field model is a machine learning model;
a subset determining module 506 configured to determine at least two subsets of sampling points based on the symbol distance function value corresponding to each sampling point;
The rendering module 508 is configured to render and generate a target sub-object corresponding to each sub-set of sampling points according to the density information and the color feature value of the sampling points in each sub-set of sampling points, and generate a target object based on each target sub-object.
Optionally, the determining module 502 is further configured to:
determining sampling position information on a preset hemispherical surface;
And determining corresponding sampling visual angle information based on the sampling position information.
Optionally, the input module 504 is further configured to:
the sampling position information, the sampling visual angle information and the target object image are input into a pre-trained nerve radiation field model;
The neural radiation field model transmits sampling point rays to sampling points of the target object image based on the sampling position information and the sampling perspective information;
And determining the density information, the color characteristic value and the symbol distance function value of each sampling point based on the sampling point rays corresponding to each sampling point.
Optionally, the subset determining module 506 is further configured to:
Determining a preset symbol distance function value interval corresponding to each sampling point subset;
And determining a sampling point subset corresponding to each sampling point based on the symbol distance function value corresponding to each sampling point and each preset symbol distance function value interval.
Optionally, the rendering module 508 is further configured to:
selecting a target sampling point subset;
And rendering and generating a target sub-object corresponding to the target sampling point sub-set based on the density information and the color characteristic value of each sampling point in the target sampling point sub-set.
Optionally, the rendering module 508 is further configured to:
generating an initial target object based on each target sub-object;
And improving the resolution of the initial target object to obtain the target object.
Optionally, the rendering module 508 is further configured to:
inputting the initial target object into a resolution enhancement model;
And obtaining a target object corresponding to the initial target object output by the resolution enhancement model.
Optionally, the apparatus further comprises a training module configured to:
acquiring a sample sampling position, sample sampling visual angle information and a sample object image;
inputting the sample sampling position, the sample sampling visual angle information and the sample object image into a nerve radiation field model to obtain density information, color characteristic values and symbol distance function values corresponding to each sample sampling point output by the nerve radiation field model;
inputting the sample sampling position, the sample sampling visual angle information and the sample object image into an object generation model to obtain a sample object output by the object generation model;
Determining at least two sample sampling point subsets according to density information, color characteristic values and symbol distance function values corresponding to each sample sampling point, rendering and generating a prediction sub-object corresponding to each sample sampling point subset, and generating a prediction object based on each prediction sub-object;
image segmentation is carried out on the sample object to obtain a sample sub-object corresponding to the sample object;
calculating a first loss value according to the prediction sub-object and the sample sub-object, and calculating a second loss value according to the prediction object and the sample object;
And adjusting model parameters of the nerve radiation field model based on the first loss value and the second loss value, and continuing training until a model training stopping condition is reached.
Optionally, the training module is further configured to:
Acquiring a reference object;
image segmentation is carried out on the reference object to obtain a reference sub-object corresponding to the reference object;
Calculating a third loss value according to the reference sub-object and the prediction sub-object;
And adjusting model parameters of the nerve radiation field model according to the third loss value.
The image processing device based on the nerve radiation field provided by the specification responds to an image processing instruction to determine sampling position information, sampling visual angle information and a target object image, inputs the sampling position information, the sampling visual angle information and the target object image into a pre-trained nerve radiation field model to be processed, obtains density information, color characteristic values and symbol distance function values corresponding to each sampling point output by the nerve radiation field model, wherein the nerve radiation field model is a machine learning model, determines at least two sampling point subsets based on the symbol distance function values corresponding to each sampling point, renders and generates a target sub-object corresponding to each sampling point subset according to the density information and the color characteristic values of the sampling points in each sampling point subset, and generates the target object based on each target sub-object.
An embodiment of the present disclosure achieves that density information, color feature values, and symbol distance function values (SDF values) of sampling points are obtained according to sampling position information, sampling perspective information, and a target object image by fusing symbol distance function values into a neural radiation field model (NeRF model), components of a virtual object are determined by means of an SDF value set NeRF model, the components are rendered respectively, and components generated by rendering are spliced into the target virtual object. According to the method, the geometric property of the neural network to the 3D object is guaranteed through the density information and the color characteristic values, the corresponding component part of each sampling point is determined through the SDF value, each component part of the virtual object is generated, and the superiority and consistency of the rendering effect of the virtual object are guaranteed.
And secondly, the resolution of the target object can be improved through a SytleGan resolution improvement model.
The above is a schematic solution of an image processing apparatus based on a neural radiation field of the present embodiment. It should be noted that, the technical solution of the image processing apparatus based on the nerve radiation field and the technical solution of the image processing method based on the nerve radiation field belong to the same concept, and details of the technical solution of the image processing apparatus based on the nerve radiation field, which are not described in detail, can be referred to the description of the technical solution of the image processing method based on the nerve radiation field.
Corresponding to the above-mentioned face generating method embodiment based on the nerve radiation field, the present disclosure further provides an embodiment of a face generating device based on the nerve radiation field, and fig. 6 shows a schematic structural diagram of a face generating device based on the nerve radiation field according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:
an image determination module 602 configured to determine sampling position information, sampling perspective information, and a face image in response to a face generation instruction;
The model input module 604 is configured to input the sampling position information, the sampling visual angle information and the face image into a pre-trained neural radiation field model for processing, and obtain density information, a color characteristic value and a symbol distance function value corresponding to each sampling point output by the neural radiation field model, wherein the neural radiation field model is a machine learning model;
A subset determining module 606 configured to determine at least two subsets of sampling points based on the symbol distance function value corresponding to each sampling point;
the face rendering module 608 is configured to render and generate a face subset corresponding to each sampling point subset based on each sampling point subset, and generate a target face based on each face subset.
Optionally, the image determining module 602 is further configured to:
determining sampling position information on a preset hemispherical surface;
And determining corresponding sampling visual angle information based on the sampling position information.
Optionally, the model input module 604 is further configured to:
The sampling position information, the sampling visual angle information and the face image are input into a pre-trained nerve radiation field model;
The nerve radiation field model transmits sampling point rays to sampling points of the face image based on the sampling position information and the sampling visual angle information;
And determining the density information, the color characteristic value and the symbol distance function value of each sampling point based on the sampling point rays corresponding to each sampling point.
Optionally, the subset determining module 606 is further configured to:
Determining a preset symbol distance function value interval corresponding to each sampling point subset;
And determining a sampling point subset corresponding to each sampling point based on the symbol distance function value corresponding to each sampling point and each preset symbol distance function value interval.
Optionally, the face rendering module 608 is further configured to:
selecting a target sampling point subset;
and rendering and generating a target face sub-object corresponding to the target sampling point sub-set based on the density information and the color characteristic value of each sampling point in the target sampling point sub-set.
Optionally, the face rendering module 608 is further configured to:
Generating an initial face based on each target face sub-object;
And improving the resolution of the initial face to obtain a target face.
Optionally, the face rendering module 608 is further configured to:
inputting the initial face to a resolution enhancement model;
And obtaining a target face corresponding to the initial face output by the resolution enhancement model.
Optionally, the apparatus further comprises a model training module configured to:
Acquiring a sample sampling position, sample sampling visual angle information and a sample face image;
Inputting the sample sampling position, the sample sampling visual angle information and the sample face image into a nerve radiation field model to obtain density information, color characteristic values and symbol distance function values corresponding to each sample sampling point output by the nerve radiation field model;
Inputting the sample sampling position, the sample sampling visual angle information and the sample face image into a face generation model to obtain a sample face output by the face generation model;
determining at least two sample sampling point subsets according to density information, color characteristic values and symbol distance function values corresponding to each sample sampling point, rendering and generating predicted face sub-objects corresponding to each sample sampling point subset, and generating predicted faces based on each predicted face sub-object;
Image segmentation is carried out on the sample face to obtain a sample face sub-object corresponding to the sample face;
Calculating a first loss value according to the predicted face sub-object and the sample face sub-object, and calculating a second loss value according to the predicted face and the sample face;
And adjusting model parameters of the nerve radiation field model based on the first loss value and the second loss value, and continuing training until a model training stopping condition is reached.
Optionally, the model training module is further configured to:
Acquiring a reference face;
image segmentation is carried out on the reference face to obtain a reference face sub-object corresponding to the reference face;
Calculating a third loss value according to the reference face sub-object and the predicted face sub-object;
And adjusting model parameters of the nerve radiation field model according to the third loss value.
According to the face generating device based on the nerve radiation field, the method that the symbol distance function value is fused into the nerve radiation field model (NeRF model) is achieved, the density information, the color characteristic value and the symbol distance function value (SDF value) of the sampling point are obtained according to the sampling position information, the sampling visual angle information and the target object image, each component part of the face is determined through the SDF value set NeRF model, each component part is rendered, and each component part generated through rendering is spliced into a face image. According to the method, the geometric properties of the neural network to the 3D object are guaranteed through the density information and the color characteristic values, the corresponding component parts of each sampling point are determined through the SDF values, the component parts of the face image are generated, and the superiority and consistency of the virtual object rendering effect are guaranteed.
And secondly, the resolution of the target object can be improved through a SytleGan resolution improvement model.
The above is a schematic scheme of a face generating device based on a nerve radiation field in this embodiment. It should be noted that, the technical solution of the face generating device based on the nerve radiation field and the technical solution of the face generating method based on the nerve radiation field belong to the same conception, and details of the technical solution of the face generating device based on the nerve radiation field, which are not described in detail, can be referred to the description of the technical solution of the face generating method based on the nerve radiation field.
Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with an embodiment of the present specification. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.
Wherein the processor 720 executes the computer instructions to implement the steps of the image processing method based on the neural radiation field or the face generating method based on the neural radiation field.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field.
An embodiment of the present specification further provides an augmented reality AR device or a virtual reality VR device, including:
a memory, a processor, and a display;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, which when executed by the processor, perform the steps of:
Determining sampling position information, sampling perspective information and a target object image in response to the image processing instruction;
Inputting the sampling position information, the sampling visual angle information and the target object image into a pre-trained neural radiation field model for processing to obtain density information, color characteristic values and symbol distance function values corresponding to each sampling point output by the neural radiation field model, wherein the neural radiation field model is a machine learning model;
determining at least two sub-sets of sampling points based on the symbol distance function value corresponding to each sampling point;
Rendering according to density information and color characteristic values of sampling points in each sampling point subset to generate target sub-objects corresponding to each sampling point subset, and generating target objects based on each target sub-object;
And displaying the target object through a display of the augmented reality AR equipment or the virtual reality VR equipment.
The foregoing is a schematic scheme of an augmented reality AR device or virtual reality VR device of the present embodiment. It should be noted that, the technical solution of the augmented reality AR device or the virtual reality VR device and the technical solution of the above-mentioned image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field belong to the same concept, and details of the technical solution of the augmented reality AR device or the virtual reality VR device which are not described in detail may refer to the description of the technical solution of the above-mentioned image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field.
An embodiment of the present specification also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a neural radiation field-based image processing method or a neural radiation field-based face generation method as described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the above-mentioned image processing method based on the nerve radiation field or the technical solution of the face generating method based on the nerve radiation field belong to the same conception, and the details of the technical solution of the storage medium are not described in detail, and all reference can be made to the description of the technical solution of the above-mentioned image processing method based on the nerve radiation field or the technical solution of the face generating method based on the nerve radiation field.
An embodiment of the present disclosure further provides a computer program, wherein the computer program when executed in a computer causes the computer to perform the steps of the above-described image processing method based on a neural radiation field or the face generating method based on a neural radiation field.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field belong to the same conception, and details of the technical solution of the computer program which are not described in detail can be referred to the description of the technical solution of the image processing method based on the nerve radiation field or the face generating method based on the nerve radiation field.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.