Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present specification more apparent, the following detailed description of the embodiments of the present specification will be given with reference to the accompanying drawings.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many different forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present specification. One skilled in the relevant art will recognize, however, that the aspects of the specification may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known aspects have not been shown or described in detail to avoid obscuring aspects of the description.
Furthermore, the drawings are only schematic illustrations of the present specification and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
To implement the problem of cross-domain identification, the cross-domain living solutions can be divided into two types, differentiated according to the deployment model version. The first type is as described in the background art, and an identification model is set for each use scene; the second type is a cross-domain living detection method based on single model iteration. Such methods maintain one identity model (typically deployed at a cloud server) for all scenarios. Whenever a new scene is accessed, the model needs to be optimized and adapted. Although only one model is needed to be maintained in the method, the maintenance cost is low, and the overall performance of the model is poor because one model is used in all scenes.
The embodiments of the present specification provide an image detection method and apparatus, an object classification model, a computer-readable storage medium, an electronic device, and a computer program product, which can solve the problems existing in the related art. Specifically, the following details an image detection method embodiment and an object classification model embodiment provided in the present specification by referring to fig. 1 to 9:
fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present disclosure. Referring to FIG. 1, the embodiment shown in this figure includes S110-S140.
In S110, a target classification model is determined, wherein the target classification model is trained by training samples from the first data domain.
The object classification model in the embodiments of the present disclosure is trained by training samples from the first data domain, that is, the object classification model is applicable to classification prediction of images in the first data domain. For example, for the image a to be measured from the first data field, the classification result about the image a to be measured can be obtained with high accuracy by the target classification model. However, the classification of the image B to be measured cannot be accurately predicted only by inputting the image B to be measured from another data field into the target classification model, which is obtained by training the model with the training sample from the first data field.
In exemplary embodiments, face image-based identification has been rapidly developed and widely used in recent years. While it provides convenience for people's production and life, new safety problems have also emerged. Living body attacks are one of the major security threats faced by current face recognition systems. An attacker can carry out living body attack through means such as a mobile phone photo, a paper mask, a silica gel mask and the like. Once a living attack is successful, the property security and information security of the user are greatly threatened, and thus living attack detection (living attack prevention technique) is necessary. The living body anti-attack is an algorithm technology for detecting and intercepting living body attacks (including attack behaviors such as mobile phone photos, paper photos, masks and the like) in a face recognition system.
It can be seen that the above-described object classification model may be a model for in-vivo protection against attacks. It will be understood, of course, that the embodiment of the present specification will be described taking a model for preventing an attack of a living body as an example, and the object classification model may also be used for distinguishing an identity or the like, and the embodiment of the present specification does not limit the specific function of the object classification model for performing image detection.
For example, different data fields may correspond to different model usage scenarios in which the model application environment is different, the functionality is different, etc. The model use scene can be an office access control use scene, an identity recognition scene of a subway station and the like, and a plurality of scene terminals for paying by face recognition. The second data field in the following embodiments may be an identification scene corresponding to a station such as a subway or a scene of payment by face recognition, etc. in the case where the above-described first data field corresponds to a use scene of an office entrance guard.
The target classification model applicable to the use scene of the office entrance guard cannot obtain a prediction result with high accuracy if the target classification model is directly used for the identification scene of stations such as subways or the scene of payment through the face recognition mode. The embodiment provided by the specification provides a scheme capable of solving the problem of cross-domain identification, and meanwhile, the technical effects of considering the model maintenance cost and the performance of cross-domain adaptation can be achieved. Therefore, for example, the target classification model applicable to the use scene of the office entrance guard can be also used for the identification scene of stations such as subways and the like or the scene of payment through the face recognition mode, and the classification result with high accuracy can be obtained.
In S120, a first uncertainty estimate corresponding to a first target data set from the first data field is determined by the target classification model. And, in S130, determining a second deterministic estimate corresponding to a second target data set from a second data domain by the target classification model.
In an exemplary embodiment, referring to FIG. 2, in one aspect, a first target data set 22 from a first data field is input to a target classification model 300, and the first uncertainty estimate 24 is determined based on the output of the target classification model 300. On the other hand, a second target data set 22 'from a second data domain is input to the target classification model 300, and the second uncertainty estimate 24' is determined based on the output of the target classification model 300. Further, in the present description, the uncertainty difference between two data fields is taken as the feature migration 25, and further, the feature migration 25 is applied to the image 26 to be measured in the new field (i.e. the second data field), so that the classification 27 of the image to be measured in the new field can be predicted by using the target classification model 300.
Where uncertainty estimation refers to outputting a gaussian distribution with mean and variance instead of a predictive probability when predicting using a deep learning model.
In the exemplary embodiment, the uncertainty corresponding to the data field includes two aspects, namely "sample uncertainty estimation", and because of the difference of data between different data fields, "sample uncertainty estimation" is used as the first aspect of uncertainty corresponding to the data field in the embodiment of the present specification. On the other hand, the same model also has differences in the activated neural units in processing data in different data domains, so that "model uncertainty estimation" is used as the second aspect of uncertainty corresponding to the data domain in the embodiments of the present specification. Through the uncertainty of the two aspects, the difference between the two data domains can be effectively measured to determine the characteristic migration quantity, and the accuracy of data detection in the new domain is further ensured based on the characteristic migration quantity.
In an exemplary embodiment, the object classification model 300 is described in detail. For the traditional target classification model, only a single prediction result of a single model structure can be given, and the uncertainty of the model cannot be effectively estimated. In the solution provided in the embodiment of the present disclosure, in order to effectively estimate uncertainty of a model in an inference stage, a result of providing a target classification model includes a main network and an attention mechanism-based discard structure (attention-based discard), where the attention mechanism-based discard structure is embedded in the main network, and the main network may be any classification network. By way of example, referring to fig. 3, a classical CNN (Convolutional Neural Networks, convolutional neural network) Resnet network structure is illustrated as the primary network.
Referring to fig. 3, in the exemplary embodiment, each residual block (residual block) 36 of the Resnet network structure is followed by an attention-based discard structure (attention-based discard) 400. It will be appreciated that the main network Resnet network structure employed in the embodiments of the present description contains 7 residual blocks, and in other exemplary embodiments, the above-described attention mechanism-based discard structure may be embedded after at least one residual block of the main network, e.g., the attention mechanism-based discard structure may be embedded after the 1 st residual block, the attention mechanism-based discard structure may be embedded after the 7 th residual block, etc., and after the attention mechanism-based discard structure is embedded, the model result may be used to determine the model uncertainty estimate.
Therein, attention mechanism based override structure (attention-based dropout) 400 may refer to fig. 4, and an attention mechanism based override structure (attention-based dropout) 400 includes a SE block (Squeeze and Excitation block, squeeze stimulus block) 410 and an override unit (dropout) 420.
It should be noted that, unlike the conventional dropout (the probability of discarding each element in the image is the same), the probability of discarding each element in the image provided by the embodiment of the present disclosure is not the same, and in the embodiment of the present disclosure, during the training process by the training sample from the first data field, the probability of discarding each element in the training sample by the discarding unit dropout 420 is determined by the SE block 410 through adaptive calculation, which is beneficial to protecting important neurons in the model, and further beneficial to improving the overall performance of the model. On the other hand, dropout 420 based may enable the uncertainty estimate of the model to be determined by multiple inferences of the same sample in the model inference phase.
In an exemplary embodiment, in the training process of the target classification model, the dropout 420 is in an on state, image features corresponding to the training samples respectively flow into the SE block 410 and the dropout 420 after passing through the residual block 36, and the probability of discarding each element in the training samples obtained by adaptive learning of the SE block 410 is further output to the dropout 420, so that the dropout 420 can discard the corresponding element in the training samples according to the obtained probability of discarding each element in the training samples.
In an exemplary embodiment, in the training process of the target classification model, the input of the model is a face image from the first data field, and the output of the model is a living body attack probability corresponding to each training sample. The adopted loss function is a classification loss function, specifically, based on the network structure and the loss function of the model, training is performed by an SGD (Stochastic GRADIENT DESCENT, random gradient descent) mode until the model converges.
Further, an embodiment of determining a first uncertainty estimate based on the above-described object classification model trained from training samples from a first data field is described below. As described above, the first uncertainty estimate includes a sample uncertainty estimate and a model uncertainty estimate. The following describes a determination embodiment of a sample uncertainty estimate in connection with fig. 5 and 6, and a determination embodiment of a model uncertainty estimate in connection with fig. 7 and 8. It will be appreciated that in order to ensure the accuracy of the determination of the first uncertainty estimate, the samples used to determine the sample uncertainty estimate and the model uncertainty estimate are identical, both using the first target data set described above. If the first target data set includes N samples, N is a positive integer.
In an exemplary embodiment, referring to FIG. 5, an embodiment of a determination method for sample uncertainty estimation provided by the figure includes S510-S530.
In S510, each of the at least one partial areas of the i-th sample is subjected to a disturbance process, and the disturbance samples obtained after the disturbance process are used as i-th sample subsets, i being a positive integer not greater than N.
Illustratively, referring to fig. 6, the i-th sample is region segmented, and the face image may be segmented into portions, e.g., eyes, mouth, nose, cheek, hair, background, etc., by a face segmentation model. Further, perturbation (e.g., gaussian blur, random erase, etc.) is performed on different image regions, for example, a mouth region in an i-th sample is subjected to gaussian blur to obtain an i 1-th perturbation sample, an eye region in the i-th sample is subjected to gaussian blur to obtain an i 2-th perturbation sample,. In this embodiment, the disturbance sample obtained after the above-described at least one disturbance process is taken as the i-th sample subset 62. The above-described sample processing scheme is referred to as a region-based data augmentation scheme (region-based augmentation) in the embodiments of the present specification.
With continued reference to fig. 5, in S520, a sample uncertainty estimate corresponding to the ith sample is determined by the target classification model without activating a discard unit.
It can be understood that in the process of determining the sample uncertainty estimation, the present embodiment may obtain different outputs about the ith sample after disturbing the ith sample subset obtained after each disturbance sample in the ith sample subset is respectively input into the target classification model, so that the sample uncertainty estimation can be implemented without turning on the reject unit dropout.
Illustratively, referring to FIG. 6, each sample in the ith subset of samples is input to the target classification model 64, respectively, without the reject unit dropout being enabled. Further, a plurality of outputs of the target classification model are counted to obtain a sample uncertainty estimation corresponding to the ith sample. For example, after the i1 st disturbance sample inputs the target classification model 64 of the reject unit dropout not being turned on, the model outputs the probability p i1 that the i1 st disturbance sample belongs to the attack object, after the i2 nd disturbance sample inputs the target classification model 64 of the reject unit dropout not being turned on, the i i2 is input, after the i2 nd disturbance sample inputs the target classification model 64 of the reject unit dropout not being turned on, the model outputs the probability p i2 that the i2 nd disturbance sample belongs to the attack object, the model outputs the probability p ix that the ix disturbance sample belongs to the attack object. Further, counting the average value of the probabilities (p i1,pi2,…,pix) to obtain a sample uncertainty estimation of a single sample, namely obtaining a sample uncertainty estimation corresponding to the ith sample. In other embodiments, other statistics (e.g., variance, etc.) of the probability (p i1,pi2,…,pix) described above may also be used as the sample uncertainty estimate for the i-th sample.
With continued reference to fig. 5, in S530, a sample uncertainty estimate included in the first uncertainty estimate is determined from sample uncertainty estimates corresponding to the N samples, respectively.
Illustratively, the sample uncertainty estimates of the first data domain may be determined by further counting the sample uncertainty estimates corresponding to the N samples, respectively. For example, the mean of the sample uncertainty estimates for the N samples may be counted as the sample uncertainty estimate for the first data field. In other embodiments, other statistics (e.g., variance, etc.) of the sample uncertainty estimates for the N samples may also be used as the sample uncertainty estimates for the first data field.
In an exemplary embodiment, referring to FIG. 7, an embodiment of a determination method for model uncertainty estimation provided by the figure includes S710-S720.
It can be understood that in the process of determining the model uncertainty estimation, the present embodiment may not obtain the subset of the ith sample by disturbing the ith sample, but if the reject unit dropout is turned off, after inputting the ith sample into the target classification model multiple times, the output of the ith sample is single, so that the reject unit dropout needs to be turned on to realize the model uncertainty estimation.
In S710, in the case of starting the discarding unit, the i-th sample is input to the target classification model at least twice, and a model uncertainty estimate corresponding to the i-th sample is determined from the output of the target classification model.
Illustratively, referring to FIG. 8, the ith sample is input into the target classification model multiple times (y times, y being a positive integer greater than 1) to yield a probability of model output of p' i1,p'i2,…,p'iy, respectively. Further, the mean value of the probability (p' i1,p'i2,…,p'iy) is counted to obtain the model uncertainty estimation of the single sample, namely the model uncertainty estimation corresponding to the ith sample is obtained. In other embodiments, other statistics (e.g., variance, etc.) of the probability (p' i1,p'i2,…,p'iy) described above may also be used as the model uncertainty estimate for the i-th sample.
In S720, a model uncertainty estimate included in the first uncertainty estimate is determined according to the model uncertainty estimates corresponding to the N samples, respectively.
For example, the model uncertainty estimates of the first data domain may be determined by counting the model uncertainty estimates corresponding to the N samples. For example, the mean of the model uncertainty estimates for the N samples may be counted as the model uncertainty estimate for the first data domain. In other embodiments, other statistics (e.g., variance, etc.) of the model uncertainty estimates for the N samples described above may also be used as model uncertainty estimates for the first data domain.
Thus far, a determination embodiment has been described with respect to the sample uncertainty estimation and model uncertainty estimation of the first data field described above. It will be appreciated that the statistical value types used are consistent, such as the average value, during the model uncertainty estimation of a single sample (S520), during the model uncertainty estimation of a single sample (S710). During the model uncertainty estimation of the whole sample (S530), and during the model uncertainty estimation of the whole sample (S720), the statistical value types adopted are consistent, such as average values.
Further, the influence coefficients may be set according to the model uncertainty estimation and the influence degree of the sample uncertainty estimation on the feature migration amount, respectively. For example, the model uncertainty estimate M1 has an influence coefficient of 0.3 and the sample uncertainty estimate M2 has an influence coefficient of 0.7, the first uncertainty estimate may be expressed as 0.3 XM1+0.7 XM2.
It should be noted that, since the embodiment for obtaining the second uncertainty estimate (embodiment of S130) is similar to the embodiment for obtaining the first uncertainty estimate (embodiment of S120), the difference is only that the first target data set is from the first data field and the second target data set is from the second data field. Thus, embodiments of the second uncertainty estimate may refer to embodiments of the first uncertainty estimate.
In an exemplary embodiment, referring to fig. 2, after determining the first uncertainty estimate and the second uncertainty estimate, the specification uses a difference between the first uncertainty estimate and the second uncertainty estimate as a feature migration amount 25, and further, applies the feature migration amount 25 to the image 26 to be measured in the new domain (i.e., the second data domain), so as to implement the classification 27 of predicting the image to be measured in the new domain by using the target classification model 300. In an exemplary embodiment, the above-mentioned present feature migration amount 25 is superimposed before the image feature corresponding to the image to be measured in the second data field enters the classification layer of the target classification model.
For example, referring to fig. 9, in the case of using Resnet network structure as the main network, the image 90 to be measured from the second data domain is input to the target classification model 300 (no dropout unit is required to be turned on at this time), and after the image features corresponding to the image 90 to be measured are sequentially processed by the convolution layer 32, the max-pooling layer 34, and the multiple sets of residual blocks and the attention-based discard structure, the image features are input to the average pooling layer 38, and in this embodiment, the features of the classification layer (i.e., the full connection layer 310) input to the target classification model are determined as target image features.
In an exemplary embodiment of the present disclosure, referring to fig. 9, the feature offset 25 is superimposed on the target image feature to obtain an offset image feature. The detection result of the image to be detected 90 in the second data domain can be obtained by classifying the image features after the offset by the classifying layer (i.e., the full connection layer 310). For example, referring to fig. 9, the prediction result of the image to be measured 90 is P, and if P is greater than the preset threshold P, the image to be measured may be determined to be an attack object, otherwise, if P is not greater than the preset threshold P, the image to be measured may be determined to be not an attack object but a living body. For example, in the case where P is greater than the preset threshold value P, it may be determined that the image P to be measured may be an attack object such as a mobile phone photo, a paper mask, a silicone mask, or the like, and in the case where P is not greater than the preset threshold value P, it may be determined that the image P to be measured is derived from a living body. As can be seen, the present description embodiments enable high-precision in-vivo detection (i.e., cross-domain in-vivo detection) between different data domains using the same set of in-vivo algorithms.
In other embodiments, where the Resnet network structure is used as the primary network, the feature offset 25 may be further added to other layers of the object classification model (e.g., added to the input features of the average pooling layer 38) as shown in fig. 9, which is not limited in this disclosure.
Because the offset image features contain the difference between the two data fields, even if the image to be measured is from a new field (namely, the second data field), accurate classification can be realized through the target classification model applicable to the first data field. Therefore, if a target classification model S applicable to a usage scenario of an office entrance guard exists, after determining a characteristic offset between a second data field (corresponding to an identification scenario of a station such as a subway) and a first data field (corresponding to a usage scenario of an office entrance guard), the model S may be used for the identification scenario of the station such as the subway, and meanwhile, the technical effect of taking into account the maintenance cost of the model and the performance of cross-domain adaptation may be achieved.
In the solution provided in the embodiments of the present disclosure, a target classification model is first determined, where the model is obtained by training a training sample from a first data field, that is, the model is suitable for detecting an image to be detected in the first data field. Specifically, on one hand, a spoil structure (attention-based dropout) based on an attention mechanism is added in the target classification model, so that the model uncertainty estimation can be effectively determined through the target classification model, and on the other hand, the sample uncertainty estimation can be effectively determined in a mode of a region-based data addition scheme (region-based augmentation) in the embodiment of the specification. More specifically, a first uncertainty estimate corresponding to a first target data set from the first data domain is determined based on the model, and a second uncertainty estimate corresponding to a second target data set from a second data domain (another data domain) is determined. Further, a characteristic offset is determined from the second uncertainty estimate and the first uncertainty estimate, the characteristic offset being capable of characterizing a characteristic change between the second data domain and the first data domain.
Finally, based on the above-described feature offset including various uncertainty estimates, a detection result regarding the image to be detected in the second data domain can be determined. Therefore, in the scheme provided by the embodiment of the specification, the target classification model of the first data domain can detect the image to be detected in the second data domain, and a special model is not required to be maintained for each scene, and meanwhile, compared with the cross-domain living body detection method based on single model iteration, the cross-domain classification accuracy of the scheme is high. Therefore, the room provided by the embodiment of the specification has the technical effect of taking the model maintenance cost and the cross-domain adaptation performance into consideration.
It should be noted that the above-described figures are only schematic illustrations of processes involved in the method according to the exemplary embodiments of the present specification, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
The following are device embodiments of the present specification that may be used to perform method embodiments of the present specification. For details not disclosed in the device embodiments of the present specification, please refer to the method embodiments of the present specification.
In this case, fig. 10 is a schematic diagram showing the structure of an image detection apparatus to which an embodiment of the present specification can be applied. Referring to fig. 10, the image detection apparatus shown in the figure may be implemented as all or a part of the electronic device by software, hardware, or a combination of both, or may be integrated on a server as a separate module, or may be integrated in the electronic device as a separate module.
The image detection apparatus 1000 described above in the embodiment of the present specification includes a model determination module 1010, an uncertainty estimation determination module 1020, an offset determination module 1030, and a detection module 1040.
The model determining module 1010 is configured to determine a target classification model, where the target classification model is obtained by training a training sample from a first data domain, the uncertainty estimation determining module 1020 is configured to determine a first uncertainty estimation corresponding to a first target data set through the target classification model, the first target data set being from the first data domain, the uncertainty estimation determining module 1020 is further configured to determine a second uncertainty estimation corresponding to a second target data set through the target classification model, the second target data set being from the second data domain, the offset determining module 1030 is configured to determine a feature offset based on the first uncertainty estimation and the second uncertainty estimation, and the detecting module 1040 is configured to determine a detection result regarding an image to be detected in the second data domain based on the target classification model and the feature offset.
In an exemplary embodiment, fig. 11 schematically shows a structural diagram of an image detection apparatus according to another exemplary embodiment of the present specification. Please refer to fig. 11:
In an exemplary embodiment, the objective classification model comprises an attention mechanism-based discarding structure comprising a squeeze excitation block and a discarding unit, wherein the probability of discarding each element in the training sample by the discarding unit during training from the first data field is determined by the squeeze excitation block adaptive calculation.
In an exemplary embodiment, based on the foregoing, the first uncertainty estimate comprises a sample uncertainty estimate, the first target data set comprises N samples, N is a positive integer;
the uncertainty estimation determination module 1020 includes a perturbation unit 10202, a first prediction unit 10204, and a first determination unit 10206.
The method includes a first determining unit 10202 configured to determine a sample uncertainty estimate corresponding to an i-th sample by using the target classification model without starting the discarding unit, and a first determining unit 10206 configured to determine the sample uncertainty estimate included in the first uncertainty estimate according to the sample uncertainty estimates corresponding to the N-th samples, wherein the i-th sample is a subset of the i-th sample obtained after the disturbance processing, i is a positive integer not greater than N, and the target classification model is used for determining the sample uncertainty estimate corresponding to the i-th sample.
In an exemplary embodiment, based on the foregoing, the first prediction unit 10204 is specifically configured to input each sample in the ith sample subset to the target classification model separately without starting the discarding unit, and calculate a plurality of outputs of the target classification model to obtain a sample uncertainty estimate corresponding to the ith sample.
In an exemplary embodiment, the first uncertainty estimate further comprises a model uncertainty estimate, the uncertainty estimate determination module 1020 further comprises a second prediction unit 10204 'and a second determination unit 10206', based on the foregoing scheme.
The second prediction unit 10204 'is configured to input the ith sample into the target classification model at least twice when the discarding unit is started, determine a model uncertainty estimate corresponding to the ith sample according to an output of the target classification model, and the second determination unit 10206' is configured to determine a model uncertainty estimate included in the first uncertainty estimate according to model uncertainty estimates corresponding to the N samples, respectively.
In an exemplary embodiment, based on the foregoing scheme, the second prediction unit 10204' is specifically configured to input the ith sample to the target classification model twice when the discarding unit is started, and calculate a plurality of outputs of the target classification model to obtain a model uncertainty estimate corresponding to the ith sample.
In an exemplary embodiment, based on the foregoing scheme, the uncertainty estimation determination module 1020 is further configured to determine the first uncertainty estimation based on a preset influence coefficient, the sample uncertainty estimation, and the model uncertainty estimation.
In an exemplary embodiment, based on the foregoing scheme, the detection module 1040 is specifically configured to input an image to be detected from the second data domain into the target classification model, where a feature of a classification layer input into the target classification model is determined as a target image feature, superimpose the feature offset onto the target image feature to obtain an offset image feature, and classify the offset image feature by the classification layer to obtain a detection result about the image to be detected in the second data domain.
It should be noted that, in the image detection apparatus provided in the foregoing embodiment, only the division of the foregoing functional modules is used as an example when the image detection method is executed, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.
In addition, the image detection device and the image detection method provided in the foregoing embodiments belong to the same concept, so for details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image detection method described in the present disclosure, and the details are not repeated here.
Fig. 12 schematically shows a structural diagram of an electronic device in an exemplary embodiment according to the present specification. Referring to fig. 12, an electronic device 1200 includes a processor 1201 and a memory 1202.
In the embodiment of the present disclosure, the processor 1201 is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine. Processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable gate array (fieldprogrammable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). Processor 1201 may also include a main processor, which is a processor for processing data in a wake-up state, and a coprocessor, which is a low-power processor for processing data in a standby state.
In the embodiment of the present disclosure, the processor 1201 is specifically configured to:
The method comprises the steps of determining a target classification model, determining a first uncertainty estimation corresponding to a first target data set through the target classification model, determining a second uncertainty estimation corresponding to a second target data set through the target classification model, determining a characteristic offset according to the first uncertainty estimation and the second uncertainty estimation, and determining a detection result of an image to be detected in the second data field according to the target classification model and the characteristic offset.
Further, the target classification model comprises an attention mechanism-based discarding structure, the attention mechanism-based discarding structure comprises a squeezing excitation block and a discarding unit, and the discarding unit adaptively calculates and determines the probability of discarding each element in a training sample by the squeezing excitation block in the training process of training the sample by the first data field.
The first uncertainty estimation comprises sample uncertainty estimation, wherein the first target data set comprises N samples, N is a positive integer, the first uncertainty estimation corresponding to the first target data set is determined through the target classification model, the first uncertainty estimation comprises the steps of respectively carrying out disturbance processing on at least one local area of an ith sample, taking the disturbance sample obtained after the disturbance processing as an ith sample subset, i is a positive integer not greater than N, determining the sample uncertainty estimation corresponding to the ith sample through the target classification model under the condition that the discarding unit is not started, and determining the sample uncertainty estimation included in the first uncertainty estimation according to the sample uncertainty estimation corresponding to the N samples.
Further, the determining the sample uncertainty estimate corresponding to the ith sample by the target classification model without activating the discarding unit includes inputting each sample in the ith sample subset to the target classification model without activating the discarding unit, and counting a plurality of outputs of the target classification model to obtain the sample uncertainty estimate corresponding to the ith sample.
The first uncertainty estimation further comprises a model uncertainty estimation, wherein the determining of the first uncertainty estimation corresponding to the first target data set through the target classification model comprises the steps of inputting the ith sample into the target classification model at least twice when the discarding unit is started, determining the model uncertainty estimation corresponding to the ith sample according to the output of the target classification model, and determining the model uncertainty estimation included in the first uncertainty estimation according to the model uncertainty estimation corresponding to the N samples.
Further, the step of inputting the ith sample at least twice into the target classification model when the discarding unit is started, and determining a model uncertainty estimate corresponding to the ith sample according to an output of the target classification model includes inputting the ith sample to the target classification model at least twice when the discarding unit is started, and counting a plurality of outputs of the target classification model to obtain a model uncertainty estimate corresponding to the ith sample.
Further, the processor 1201 is further specifically configured to determine the first uncertainty estimate based on a preset influence coefficient, the sample uncertainty estimate, and the model uncertainty estimate.
Further, the determining of the detection result of the image to be detected in the second data domain according to the target classification model and the feature offset includes inputting the image to be detected in the second data domain into the target classification model, wherein the feature of the classification layer input into the target classification model is determined as a target image feature, superimposing the feature offset onto the target image feature to obtain an offset image feature, and classifying the offset image feature through the classification layer to obtain the detection result of the image to be detected in the second data domain.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments of the present description, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement the methods in embodiments of the present description.
In some embodiments, the electronic device 1200 further includes a peripheral interface 1203 and at least one peripheral. The processor 1201, the memory 1202, and the peripheral interface 1203 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1203 via buses, signal lines, or a circuit board. Specifically, the peripheral devices include at least one of a display 1204, a camera 1205, and an audio circuit 1206.
The peripheral interface 1203 may be used to connect at least one Input/Output (I/O) related peripheral to the processor 1201 and the memory 1202. In some embodiments of the present description, the processor 1201, the memory 1202, and the peripheral interface 1203 are integrated on the same chip or circuit board, and in some other embodiments of the present description, either or both of the processor 1201, the memory 1202, and the peripheral interface 1203 may be implemented on separate chips or circuit boards. The embodiment of the present specification is not particularly limited thereto.
The display 1204 is for displaying a User Interface (UI). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1204 is a touch display, the display 1204 also has the ability to collect touch signals at or above the surface of the display 1204. The touch signal may be input as a control signal to the processor 1201 for processing. At this time, the display 1204 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments of the present disclosure, the display 1204 may be one, and provide a front panel of the electronic device 1200, in other embodiments of the present disclosure, the display 1204 may be at least two, and provided on different surfaces or in a folded design of the electronic device 1200, respectively, and in still other embodiments of the present disclosure, the display 1204 may be a flexible display, provided on a curved surface or a folded surface of the electronic device 1200. Even more, the display 1204 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 1204 may be made of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), an Organic Light-Emitting Diode (OLED), or other materials.
The camera 1205 is used to capture images or video. Optionally, the camera 1205 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments of the present description, the camera 1205 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 1206 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, and converting the sound waves into electric signals to be input to the processor 1201 for processing. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 1200. The microphone may also be an array microphone or an omni-directional pickup microphone.
The power supply 1207 is used to power the various components in the electronic device 1200. The power source 1207 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1207 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
The block diagrams of the electronic device structures shown in the embodiments of the present description do not constitute a limitation of the electronic device 1200, and the electronic device 1200 may include more or less components than illustrated, or may combine some components, or may employ different arrangements of components.
In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or" describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases of a alone, a and B together, and B alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The present description also provides a computer-readable storage medium having instructions stored therein, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of the above embodiments. Each of the constituent blocks of the image detection apparatus may be stored in the computer-readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product. Each of the constituent blocks of the image detection apparatus may be stored in the computer-readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the computer program instructions described above are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present specification are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (DIGITAL VERSATILE DISC, DVD)), or a semiconductor medium (e.g., a Solid state disk (Solid STATE DISK, SSD)), or the like.
It should be noted that the foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely specific embodiments of the present disclosure, but the scope of the disclosure is not limited thereto, and any person skilled in the art who is skilled in the art can easily think about variations or substitutions within the scope of the disclosure of the present disclosure, and it is intended to cover the variations or substitutions within the scope of the disclosure. Accordingly, equivalent variations from the claims of the present specification are intended to be covered by the present specification.