CN109558832A

CN109558832A - A kind of human body attitude detection method, device, equipment and storage medium

Info

Publication number: CN109558832A
Application number: CN201811427578.XA
Authority: CN
Inventors: 项伟; 王毅峰; 黄秋实; 梁柱锦
Original assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2019-04-02
Anticipated expiration: 2038-11-27
Also published as: WO2020108362A1; US20220004744A1; CN109558832B; US11908244B2

Abstract

The invention discloses a kind of human body attitude detection method, device, equipment and storage mediums.This method comprises: acquisition multiple image data；Current frame image data are input in human body attitude detection model trained in advance, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, human body attitude detection model is the convolutional neural networks training generation for being applied to embedded platform；In human body attitude with reference to identification human body attitude key point in figure；According to the credibility of human body attitude key point, human body attitude confidence map is generated；Judge whether current frame image data are last frame image data；If it is not, then human body attitude confidence map is input in human body attitude detection model, for participating in generating the human body attitude confidence map of next frame image data；If so, terminating the operation that execution generates the human body attitude confidence map of multiple image data.The embodiment of the present invention is realized in the enterprising pedestrian's body attitude detection of embedded platform.

Description

A kind of human body attitude detection method, device, equipment and storage medium

Technical field

The present embodiments relate to human body attitude detection technique more particularly to a kind of human body attitude detection method, device, set Standby and storage medium.

Background technique

Human body attitude detection is research direction most challenging in computer vision field, is widely used in man-machine friendship Mutually, the fields such as intelligent monitoring, virtual reality and human body behavioural analysis.But each key point institute by forming human body attitude Local image characteristics be in multiple dimensioned affine transformation, and image be easy by target person dressing, camera shooting angle, The factors such as distance, illumination variation and partial occlusion influence, so that human body attitude detection progress is slow.

In the prior art, generally use based on convolutional neural networks and carry out human body attitude detection, meanwhile, in order to obtain compared with High accuracy of identification, it usually needs acquire a large amount of training sample and long-time supervised learning is carried out to human body attitude detection model.

In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery: due to embedded flat There is no GPU (Graphics Processing Unit, graphics processor) maximum to calculation amount in convolutional neural networks in platform Convolution operation optimizes, and therefore, largely the human body attitude detection method based on convolutional neural networks can not be applied to embedded Platform.

Summary of the invention

The embodiment of the present invention provides a kind of human body attitude detection method, device, equipment and storage medium, is being embedded in realizing Human body attitude detection on formula platform.

In a first aspect, the embodiment of the invention provides a kind of human body attitude detection methods, this method comprises:

Acquire multiple image data；

Current frame image data are input in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, the human testing model be applied to it is embedded flat The convolutional neural networks training of platform generates；

In the human body attitude with reference to identification human body attitude key point in figure；

According to the credibility of the human body attitude test point, human body attitude confidence map is generated；

Judge whether current frame image data are last frame image data；

If it is not, then the human body attitude confidence map is input in the human body attitude detection model, generated for participating in The human body attitude confidence map of next frame image data；

If so, terminating the operation that execution generates the human body attitude confidence map of multiple image data.

It is further, described to be input to the current frame image data in human body attitude detection model trained in advance, With the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, comprising:

Judge whether the human body attitude confidence map of previous frame image data is credible；

If so, the human body attitude confidence map of the current frame image data and the previous frame image data is input to In advance in trained human body attitude detection model, multiple human body attitudes are exported with reference to figure；

It is detected if it is not, the current frame image data and pre-set image data are then input to human body attitude trained in advance In model, multiple human body attitudes are exported with reference to figure.

Further, it is described in the human body attitude with reference to identifying human body attitude key point in figure, comprising:

In the human body attitude with reference to the coordinate position of most probable value determining in figure, using the coordinate position as human body Posture key point.

Further, the credibility according to the human body attitude key point generates human body attitude confidence map, comprising:

Judge whether the human body attitude key point is credible；

If so, mask artwork is generated centered on the human body attitude key point, as human body attitude confidence map；

If it is not, then using the pre-set image data as human body attitude confidence map.

It is further, described to judge whether the human body attitude key point is credible, comprising:

Judge whether the corresponding probability value of the human body key point is greater than preset threshold value；

If so, determining that the human body attitude key point is credible；

If not, it is determined that the human body attitude key point is insincere.

Further, the human body attitude detection model includes main road, the first branch and second branch, and the main road includes Residual error module and up-sampling module, the first branch include refining network, and the second branch includes feedback module；

It is described to be input to current frame image data in human body attitude detection model trained in advance, to refer to previous frame figure As the human body attitude confidence map of data, multiple human body attitudes are exported with reference to figure, comprising:

Current frame image data are input to the residual error module to handle, to refer to the people of previous frame image data Body posture confidence map is input to the feedback module and is handled, and obtains the first convolution results；

The first convolution results that the residual error module exports are separately input into the up-sampling module and the refinement net Network module is handled, and the second convolution results and third convolution results are respectively obtained；

Second convolution results are added with the third convolution results, export multiple human body attitudes with reference to figure.

Further, the residual error module includes the first residual unit, the second residual unit and third residual unit；

It is described current frame image data are input to the residual error module to handle, with reference to by previous frame image data Human body attitude confidence map be input to the feedback module and handled, obtain the first convolution as a result, including:

The current frame image data are input to first residual unit to be handled to obtain the first intermediate result；

First intermediate result is input to second residual unit to handle, and by the previous frame image The human body attitude confidence map of data is input to the feedback module results added that carries out that treated, obtains the second intermediate result；

Second intermediate result is input to the third residual unit to handle, obtains third intermediate result, is made For first convolution results；

Wherein, the port number of first intermediate result, second intermediate result and the third intermediate result is successively Increase.

Further, the human body attitude detection model further includes third branch；

First convolution results by residual error module output are separately input into the up-sampling module and described mention Refining network module is handled, and the second convolution results and third convolution results are respectively obtained, comprising:

First intermediate result is input to the third branch to handle, obtains the 4th intermediate result；

Second intermediate result is input to the third branch to handle, obtains the 5th intermediate result；

The third intermediate result and the 5th intermediate result are input to the up-sampling module to handle, obtained 6th intermediate result；

4th intermediate result and the 6th intermediate result are input to the up-sampling module to handle, obtained 7th intermediate result, as second convolution results；

The first convolution results that the residual error module exports are input to the refinement network module to handle, obtain institute State third convolution results；

Wherein, the port number among the 6th centre and the described 7th is successively reduced.

Further, described to be input to current frame image data in human body attitude detection model trained in advance, with ginseng It is admitted to the human body attitude confidence map of a frame image data, exports multiple human body attitudes with reference to figure, further includes:

By first convolution results and the second convolution results added, the second objective result is obtained；

Multiple described human body attitudes are added with reference to figure with second objective result, new multiple human body attitudes ginseng is exported Examine figure；

Wherein, second objective result is used for when being trained to the human body attitude detection model, described in raising The precision of human body attitude detection model.

Second aspect, the embodiment of the invention also provides a kind of human body attitude detection device, which includes:

Image data acquiring module, for acquiring multiple image data；

Human body attitude is examined with reference to figure output module for current frame image data to be input to human body attitude trained in advance It surveys in model, with the human body attitude confidence map with reference to previous frame image data, exports multiple human body attitudes with reference to figure, the human body Attitude detection model is the convolutional neural networks training generation for being applied to embedded platform；

Human body attitude key point identification module is used in the human body attitude with reference to identification human body attitude key point in figure；

Human body attitude confidence map generation module generates human body appearance for the credibility according to the human body attitude key point State confidence map；

Judgment module, for judging whether current frame image data are last frame image data；

First execution module, for if it is not, the human body attitude confidence map, which is then input to the human body attitude, detects mould In type, for participating in generating the human body attitude confidence map of next frame image data；

Second execution module, for if so, terminating the behaviour of the human body attitude confidence map of execution generation multiple image data Make.

Further, the human body attitude is with reference to figure output module, comprising:

Confidence map credibility judging unit, whether the human body attitude confidence map for judging previous frame image data is credible；

First human body attitude reference figure output unit, for if so, by the current frame image data and described upper one The human body attitude confidence map of frame image data is input in human body attitude detection model trained in advance, exports multiple human body attitudes With reference to figure；

Second human body attitude is with reference to figure output unit, for if it is not, then by the current frame image data and pre-set image Data are input in human body attitude detection model trained in advance, export multiple human body attitudes with reference to figure.

Further, the human body attitude key point identification module, comprising:

Human body attitude key point recognition unit, for the coordinate in the human body attitude with reference to most probable value determining in figure Position, using the coordinate position as human body attitude key point.

Further, the human body attitude confidence map generation module, comprising:

Human body attitude key point credibility judging unit, for judging whether the human body attitude key point is credible；

First human body attitude confidence map generation unit is used for if so, raw centered on the human body attitude key point At mask artwork, as human body attitude confidence map；

Second human body attitude confidence map generation unit, for if it is not, then using the pre-set image data as human body attitude Confidence map.

Further, the human body attitude key point credibility judging unit, is specifically used for:

If so, determining that the human body attitude key point is credible；

If not, it is determined that the human body attitude key point is insincere.

Further, the human body attitude detection model includes main road, the first branch and second branch, and the main road includes Residual error module and up-sampling module, the first branch include refining network module, and the second branch includes feedback module；

The current frame image data are input to first residual unit to handle, obtain the first intermediate result；

First intermediate result is input to second residual unit and carries out processing and by the previous frame image The human body attitude confidence map of data is input to the feedback module results added that carries out that treated, obtains the second intermediate result；

The third aspect, the embodiment of the invention also provides a kind of equipment, which includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the method as described in first aspect of the embodiment of the present invention.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the method as described in first aspect of the embodiment of the present invention when program is executed by processor.

Current frame image data are input to human body trained in advance by acquisition multiple image data by the embodiment of the present invention In attitude detection model, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, people Body attitude detection model is to be applied to the convolutional neural networks training of embedded platform to generate, in human body attitude with reference to knowing in figure Other human body attitude key point generates human body attitude confidence map, judges current frame image according to the credibility of human body attitude key point Whether data are last frame image data, if it is not, then human body attitude confidence map is input in human body attitude detection model, are used In participating in generating the human body attitude confidence map of next frame image data, if so, terminating to execute the people for generating multiple image data The operation of body posture confidence map is realized in the enterprising pedestrian's body attitude detection of embedded platform, meanwhile, by previous frame image data Output result introduce to the prediction of the output result of current frame image data during, further improve precision of prediction.

Detailed description of the invention

Fig. 1 is the flow chart of one of embodiment of the present invention human body attitude detection method；

Fig. 2 is the application schematic diagram of one of embodiment of the present invention convolutional neural networks；

Fig. 3 is the flow chart of another human body attitude detection method in the embodiment of the present invention；

Fig. 4 is the structural schematic diagram of one of embodiment of the present invention human body attitude detection device；

Fig. 5 is the structural schematic diagram of one of embodiment of the present invention equipment.

Specific embodiment

In following each embodiments, optional feature and example are provided simultaneously in each embodiment, that records in embodiment is each A feature can be combined, and form multiple optinal plans, and the embodiment of each number should not be considered merely as to a technical solution.Under The present invention is described in further detail in conjunction with the accompanying drawings and embodiments in face.It is understood that specific reality described herein Example is applied to be used only for explaining the present invention rather than limiting the invention.It also should be noted that for ease of description, it is attached Only the parts related to the present invention are shown in figure rather than entire infrastructure.

Embodiment

So-called computer vision, exactly allows the visual performance of computer mould personification, can be managed as people by observing Solve objective world.It is studied to the effect that: how computer vision technique to be utilized to solve relevant issues focusing on people, Including object identification, recognition of face, human testing and tracking, human body attitude detection and human motion analysis etc..Human body attitude inspection Survey is the important component of Human bodys' response and the important research content of Human bodys' response system, its final mesh Be the structural parameters for exporting the wholly or partially limbs of people, such as human body contour outline, the position on head and towards, human body key point Position or site categories.It has important application at many aspects, illustratively, such as player motion identification, animation people Object production and the image based on content and video frequency searching etc..

For human body attitude detection, human body can be regarded as and be made of the different components that key point is connected, human body Attitude detection can be determined by obtaining the location information of each key point, wherein the location information of key point can use one A planar two dimensional coordinate indicates.Human body attitude detection usually require to obtain the head of human body, neck, left shoulder, right shoulder, left elbow, Right elbow, left finesse, right finesse, left stern, right stern, left knee, right knee, left ankle and right ankle amount to 14 key points.

In traditional technology, human body attitude inspection can be carried out using the human body attitude detection method based on convolutional neural networks It surveys, wherein the key problem that convolutional neural networks solve is how to automatically extract and abstract characteristics, and then Feature Mapping is arrived Task object solving practical problems, a convolutional neural networks are generally made of following three parts, and first part is input layer, the Two parts are composed of convolutional layer, excitation layer and pond layer (or down-sampling layer), the multilayer that Part III is linked entirely by one Perceptron classifier is constituted.There is convolutional neural networks weight to share characteristic, and shared refer to of weight can pass through a convolution kernel Convolution operation in other words to extract the same feature of whole image different location be the difference in an image data The same target of position, their local feature are essentially identical.It is understood that can only obtain one using a convolution kernel Kind feature can learn different features with each convolution kernel by the way that multi-kernel convolution is arranged to extract the feature of image data. It is understood that the effect of convolutional layer is the feature extraction by low level and is polymerized to high-level feature, low in image procossing Level is characterized in essential characteristic, the local features such as texture and edge, the shape etc. of high-level feature such as face and object, The global property of sample can be more showed, this process is exactly convolutional neural networks to target object level generality.

It is understood that if it is desired to realizing that the human body attitude detection method based on convolutional neural networks can be embedded flat It is run on platform, needs that the calculation amount of the convolutional neural networks is smaller, the speed of service is fast and precision of prediction meets actual requirement.

It cannot achieve to solve the human body attitude detection method based on convolutional neural networks in embedded platform operation Problem, it is contemplated that convolutional neural networks are improved, lightweight convolutional neural networks, the embodiment of the present invention can be specifically used Provided convolutional neural networks refer to lightweight convolutional neural networks.So-called lightweight convolutional neural networks refer to answer Convolutional neural networks for embedded platform.

Human body attitude detection method is further described below in conjunction with specific embodiment.

Fig. 1 is a kind of flow chart of human body attitude detection method provided in an embodiment of the present invention, and the present embodiment is applicable to The case where detecting human body attitude, this method can be executed by human body attitude detection device, the device can using software and/or The mode of hardware realizes that the device can be configured in equipment, such as typically computer or mobile terminal etc..Such as Fig. 1 institute Show, this method specifically comprises the following steps:

Step 110, acquisition multiple image data.

In an embodiment of the present invention, video can be understood as being made of an at least frame image data, therefore, in order to Human body attitude in video is identified, image data one by one can be divided video into, respectively to every frame image Data are analyzed.Here what multiple image data indicated is the image data in same video, and in other words, which includes Multiple image data.Multiple image data can be named sequentially in time.Illustratively, if video includes N frame figure As data, N >=1, at this point, sequentially in time above-mentioned N frame image data can be known as: the first frame image data, the second frame figure As data ... ..., N-1 frame image data and nth frame image data.

It is understood that when dividing video into multiple image data, it can be sequentially in time successively to every frame figure As data are handled.Meanwhile certain frame image data being presently processing can be known as current frame image data, by present frame The previous frame image data of image data is known as previous frame image data, and next frame image data of current frame image data is claimed For next frame image data.Currently it is understood that if current frame data is the first frame image data, to the present frame figure As only having next frame image data without previous frame image data for data；If current frame image data are last One frame image data only has previous frame image data without next frame image data then for current frame image data； If current frame image data are neither the first frame image data is also not last frame image data, to current frame image number For, having previous frame image data also has next frame image data.

It is using above-mentioned the reason of successively handling sequentially in time every frame image data: for human body attitude For detection, there may be certain relevances between adjacent two field pictures data, i.e., if known according to previous frame image data Not Chu certain key point appeared in some position in previous frame image, then the key point may also go out in current frame image data Near same position in present current frame image data.In other words, if the testing result of previous frame image data meets Preset condition can then refer to the testing result of previous frame image data, handle current frame image data.

Current frame image data are input in human body attitude detection model trained in advance by step 120, to refer to upper one The human body attitude confidence map of frame image data exports multiple human body attitudes with reference to figure, and human body attitude detection model is through being applied to The convolutional neural networks training of embedded platform generates.

In an embodiment of the present invention, human body attitude confidence map can refer to the image including human body attitude key point, alternatively, Human body attitude confidence map can be understood as The image being centrally generated.Human body attitude key point described here can refer to previously described head, neck, left shoulder, right shoulder, a left side 14 key points such as elbow, right elbow, left finesse, right finesse, left stern, right stern, left knee, right knee, left ankle and right ankle.

Human body attitude content with reference to of both figure may include, it is possible to each point as human body attitude key point Location information and the corresponding probability value of the location information, wherein will likely can be known as waiting as the point of human body attitude key point Reconnaissance, correspondingly, human body attitude with reference to figure may include each candidate point location information and the location information it is corresponding general Rate value, i.e., each corresponding probability value of candidate point, location information can be indicated with coordinate form.Meanwhile it can be according to each The corresponding probability value of the location information of candidate point is determined using which candidate point as human body attitude key point.Illustratively, as selected The corresponding candidate point of most probable value is selected in the corresponding probability value of location information of each candidate point as human body attitude key point. It include the location information (x of candidate point A in certain human body attitude reference figure_A, y_A) and corresponding probability value PA；The position of candidate point B Information (x_B, y_B) and corresponding probability value P_B；Location information (the x of candidate point C_C, y_C) and corresponding probability value P_C, wherein P_A< P_B< P_C, based on above-mentioned, determine using candidate point C as human body attitude key point.

It should be noted that the corresponding human body attitude key point of every human body attitude confidence map, every human body attitude ginseng Examining figure includes multiple candidate points, and the candidate point is the candidate point for some key point, as certain human body attitude is wrapped with reference to figure Multiple candidate points are included, the candidate point is the candidate point for left elbow.For another example certain human body attitude also includes multiple times with reference to figure Reconnaissance, the candidate point are the candidate points for left knee.Based on above-mentioned it will be appreciated that, for certain frame image data, need from N number of key point is determined in the frame image data, then corresponding there are N human body attitudes with reference to figure and N human body attitude confidence maps.

Trained human body attitude detection model can be embedded to be applied to by setting the training sample of sets of numbers in advance The convolutional neural networks training of platform generates, and the convolutional neural networks that can be applied to embedded platform are lightweight convolutional Neural Network, human body attitude detection model may include main road, the first branch, second branch and third branch；Main road may include residual Difference module and up-sampling module, the first branch may include refining network module, and second branch may include feedback module；Residual error Module may include the first residual unit, the second residual unit and third residual unit.For the group of human body attitude detection model It can be found in hereinafter at the detailed description of part.

Current frame image data are input in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, the following two kinds situation can be divided:

Situation one is input in human body attitude model trained in advance using current frame image data as input variable, is obtained To multiple the first human body attitude reference figures, and multiple the human body attitude confidence maps obtained according to previous frame image data, output are more Human body attitude is opened with reference to figure, wherein every first human body attitude reference figure obtains more according to corresponding previous frame image data Open certain human body attitude confidence map in human body attitude confidence map, the human body attitude reference of output current frame image data Figure, corresponding relationship described above is determination whether identical based on key point.Illustratively, as current frame image data certain Opening the key point that the first human body attitude reference figure is directed to is left elbow, then its reference is corresponding crucial in data on previous frame image Point is the human body attitude confidence map of left elbow.

It is understood that being directed to situation one, the human body attitude confidence map of previous frame image data is not used as input variable, It is input to together with current frame image data in human body attitude detection model trained in advance, but it is defeated in current frame image data Enter to human body attitude detection model trained in advance, after obtaining multiple first human body attitude reference figures, according to previous frame picture number According to multiple human body attitude confidence maps, successively determine whether every first human body attitude reference figure credible, if credible, can will This first human body attitude reference figure is as the frame human body attitude with reference to figure；It, can be by previous frame image data if insincere In for this human body attitude confidence map as the frame human body attitude with reference to figure.

It is situation two, the human body attitude confidence map of current frame image data and previous frame image data is defeated as input variable Enter into human body attitude detection model trained in advance, exports multiple human body attitudes with reference to figure.

It is understood that the human body attitude confidence map of previous frame image data is also used as input variable in above situation two, It is input to together with current frame image data in human body attitude detection model trained in advance, the beneficial effect of above-mentioned setting exists In: for for video, there is certain relevance between adjacent two field pictures data, the result of previous frame image data is made It for feedback information, is input in human body attitude detection model trained in advance, participates in the output of prediction current frame image data As a result in process, it can further improve the precision of prediction of human body attitude detection model.

It should be noted that it is directed to second situation, in order to further increase the precision of prediction of human body attitude detection model, It specifically can be used such as under type: judging whether the human body attitude confidence map of previous frame image data is credible；If credible, can incite somebody to action The human body attitude confidence map of current frame image data and previous frame image data is input to human body attitude detection mould trained in advance In type, multiple human body attitudes are exported with reference to figure；If insincere, current frame image data and pre-set image data can be inputted Into human body attitude detection model trained in advance, multiple human body attitudes are exported with reference to figure；Alternatively, can be incited somebody to action if insincere Current frame image data are input in human body attitude detection model trained in advance, export multiple human body attitudes with reference to figure.Wherein, Pre-set image data refer to the image data not comprising priori knowledge, such as all black picture, if indicating with matrix sheet form, as Full null matrix.For the output result of current frame image data, the human body attitude confidence map of previous frame image data is Image data comprising priori knowledge；For the output result of next frame image data, the human body of current frame image data Posture confidence map is the image data comprising priori knowledge.

The reason of can further improve the precision of prediction of human body attitude detection model using aforesaid way is: if upper one The human body attitude confidence map of frame image data is insincere, it can be said that the human body attitude confidence map of bright previous frame image data is not Reliably, if in these cases, human body attitude detection model trained in advance is still also input to as input variable In, not but not improve human body attitude detection model precision of prediction, may be decreased the precision of prediction of human body attitude model instead. Based on the foregoing, it is desirable to ensure to be input to the previous frame picture number in human body attitude detection model trained in advance as input variable According to human body attitude confidence map be it is believable, therefore, determining whether the human body attitude confidence map with reference to previous frame image data Before, specific use judges the whether believable mode of the human body attitude confidence map of previous frame image data to realize, if can The human body attitude confidence map of previous frame image data then can be input to human body attitude trained in advance as input variable and examined by letter It surveys in model, opposite, if insincere, not as input variable.Under type such as can be used and judge previous frame image Whether the human body attitude confidence map of data is credible, specific: closing in the human body attitude of previous frame with reference to identification human body attitude in figure Key point is generated centered on human body attitude key point if the corresponding probability value of human body key point is greater than preset threshold value Mask artwork as the human body attitude confidence map of previous frame, and determines that the human body attitude confidence map of previous frame is credible；If human body closes The corresponding probability value of key point is less than or equal to preset threshold value, then using pre-set image data as human body attitude confidence map, and determines The human body attitude confidence map of previous frame is insincere.

It should also be noted that, multiple human body attitudes described above are directed to the defeated of current frame image data with reference to figure Out as a result, i.e. current frame image data correspond to multiple human body attitudes with reference to figure, more specifically, if necessary from current frame image N number of key point is determined in data, then N human body attitudes of corresponding output are with reference to figure.Meanwhile the previous frame image data as reference Human body attitude confidence map also including N.

Separately it should be noted that the whether credible finger of the human body attitude confidence map for judging previous frame image data described above Be to judge whether every human body attitude confidence map of previous frame image data credible respectively.It is also to be appreciated that due to human body Posture confidence map can refer to the image including key point, and different key points correspond to different human body attitude confidence maps, therefore, for Different key points judge that the whether believable condition of human body attitude confidence map may be the same or different, specifically can be according to reality Situation is determined, and is not specifically limited herein.

In addition, if current frame image data are the first frame image datas, i.e., previous frame image data is not present in it, then may be used Current frame image data are input in advance trained human body attitude detection model, alternatively, can by current frame image data and Pre-set image data are input in human body attitude detection model trained in advance.

Optionally, based on the above technical solution, current frame image data are input to human body appearance trained in advance In state detection model, with the human body attitude confidence with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, specifically may be used To include: to judge whether the human body attitude confidence map of previous frame image data is credible.If so, by current frame image data and upper The human body attitude confidence map of one frame image data is input in human body attitude detection model trained in advance, exports multiple human body appearances State is with reference to figure.If it is not, current frame image data and pre-set image data, which are then input to human body attitude trained in advance, detects mould In type, multiple human body attitudes are exported with reference to figure.

In an embodiment of the present invention, in order to further increase the precision of prediction of human body attitude detection model, it is contemplated that adopt With such as under type: judging whether the human body attitude confidence map of previous frame image data is credible；It, can be by present frame figure if credible It is input to as the human body attitude confidence map of data and previous frame image data in human body attitude detection model trained in advance, output Multiple human body attitudes are with reference to figure；If insincere, current frame image data and pre-set image data can be input to preparatory instruction In experienced human body attitude detection model, multiple human body attitudes are exported with reference to figure.

By aforesaid operations, it is ensured that upper one be input to as input variable in human body attitude detection model trained in advance The human body attitude confidence map of frame image data is believable, and then the human body attitude confidence map according to previous frame image data is mentioned The priori knowledge of confession improves human body attitude detection model to the precision of prediction of the output result of current frame image data.

Illustratively, the human body attitude confidence map of a frame image data as above has N, judges that N human body attitudes are set respectively Whether letter figure is credible, and judging result is that x human body attitude confidence maps are credible, and (N-x) human body attitude confidence map is insincere, then may be used X believable human body attitude confidence maps, (N-x) pre-set image data and current frame image data are input to human body attitude inspection It surveys in model, exports multiple human body attitudes with reference to figure.

Optionally, based on the above technical solution, current frame image data are input to human body appearance trained in advance In state detection model, with the human body attitude confidence map with reference to previous frame image data, before exporting multiple human body attitudes with reference to figure, It specifically can also include: to be pre-processed respectively to every frame image data, the image data that obtains that treated.

In an embodiment of the present invention, pretreatment may include normalization and albefaction, wherein normalization refers to through a system Rank transformation, i.e., the shadow that other transforming function transformation functions convert image can be eliminated by finding one group of parameter using the not bending moment of image It rings, original image to be processed is converted into corresponding sole criterion form, the canonical form image is to translation, rotation or scaling Equiaffine transformation has invariant feature.Usually normalization include the following steps: i.e. coordinate centralization, x-shearing normalization, Scaling normalization and rotational normalization.By current frame image data be input in advance trained human body attitude detection model it Before, human body attitude detection model can be generated based on neural metwork training, and image data is normalized to played work Be conclude unified samples statistical distribution, and then accelerate e-learning speed, guarantee output data in numerical value it is small not by It eats.

It is superfluous when as input variable input due to having very strong correlation in image data between adjacent pixel Remaining.The effect of albefaction is to reduce the redundancy of input, more precisely, by whitening processing, so that input variable has Following property: correlation is lower between feature；All feature variances having the same, are usually arranged as unit in image procossing Variance.

It is understood that being input to human body trained in advance as input variable after pre-processing to image data Current frame image data in attitude detection model are image datas after treatment.Certainly, previous frame image data It is image data after treatment.

Step 130, in human body attitude with reference to identifying human body attitude key point in figure.

In an embodiment of the present invention, according to it is described previously it is found that human body attitude with reference to figure may include of both in Hold, it is possible to the location information and the corresponding probability value of the location information of each point as human body attitude key point, wherein Human body attitude key point can refer to the point for being determined as key point, and in other words, human body attitude key point is key point, together When, it can will likely be known as candidate point as the point of human body attitude key point.

Based on above-mentioned it will be appreciated that, human body attitude includes that the location information of multiple candidate points and position are believed with reference to figure Corresponding probability value is ceased, can be determined according to the corresponding probability value of location information of each candidate point using which candidate point as people Body posture key point.Illustratively, such as select most probable value in the corresponding probability value of the location information of each candidate point corresponding Candidate point as human body attitude key point.

Optionally, based on the above technical solution, in human body attitude with reference to identification human body attitude key point, tool in figure Body may include: the coordinate position in human body attitude with reference to most probable value determining in figure, using the coordinate position as human body Posture key point.

In an embodiment of the present invention, due to human body attitude with reference to figure include may be as each of human body attitude key point The location information and the corresponding probability value of the location information of point therefore can be according to corresponding to the location informations of each point Probability value determines and regard which point as human body attitude key point.It is specifically as follows: most general with reference to determination in figure in human body attitude The coordinate position of rate value, using coordinate position as human body attitude key point.

It should be noted that for every human body attitude with reference to figure for, only one human body posture key point.Using The above-mentioned mode according to probability value determines human body attitude key point, it is understood that there may be following problem, in human body attitude with reference to having in figure At least two probability values are equal, and are all larger than other probability values, then can according to the actual situation, if joint connect it is whether reasonable, Which further determining that using the coordinate position of probability value as human body attitude key point.Illustratively, as human body attitude refers to Probability value is equal there are two in figure and is all larger than other probability values, and the coordinate position of two probability values is respectively A and B, respectively by A Whether reasonably joint is carried out as human body attitude key point with B connect judgement, judging result are as follows: if using A as human body appearance State key point, then joint connection is unreasonable；If using B as human body attitude key point, joint connection is reasonable.Accordingly, it is determined that B For human body attitude key point.

Step 140, according to the credibility of human body attitude key point, generate human body attitude confidence map.

In an embodiment of the present invention, credibility may include credible and insincere, determine credible and incredible standard It can be with are as follows: whether the corresponding probability value of human body attitude key point is greater than preset threshold value, i.e., if human body attitude key point is corresponding Probability value be greater than preset threshold value, it can be said that bright human body posture key point is credible；If human body attitude key point is corresponding Probability value be less than or equal to preset threshold value, it can be said that bright human body posture key point is insincere.

On this basis, if human body attitude key point is credible, mask artwork is generated centered on human body attitude key point, As human body attitude confidence map；If human body attitude key point is insincere, can be set using pre-set image data as human body attitude Letter figure.Pre-set image data described here are identical as previously described pre-set image data, and pre-set image data can be complete Black image, if being indicated with matrix sheet form, as full null matrix.Wherein, under type such as can be used and judge human body attitude key point It is whether credible, it is specific: to judge whether the probability value of human body attitude key point is greater than preset threshold value.If human body attitude is crucial The probability value of point is greater than preset threshold value, it can be said that bright human body posture key point is credible；If human body attitude key point Probability value is less than or equal to preset threshold value, it can be said that bright human body posture key point is insincere.

It should be noted that if it is determined that human body attitude key point is insincere, then it will be corresponding in previous frame image data Human body attitude key point is as present frame human body attitude key point, still, for incredible human body attitude key point, Human body attitude confidence map is not to be generated according to human body attitude key point corresponding in previous frame image data, but according to pre- What if image data human body attitude confidence map generated.

Optionally, based on the above technical solution, according to the credibility of human body attitude key point, human body attitude is generated Confidence map can specifically include: judge whether human body attitude key point is credible.If so, using human body attitude key point in The heart generates mask artwork, as human body attitude confidence map.If it is not, then using pre-set image data as human body attitude confidence map.

In an embodiment of the present invention, mask artwork, which refers to, carries out the image obtained after image masks processing to image.Wherein, Image masks, which refer to, uses selected image, figure or object, is blocked to image (all or part) to be processed to control The region of image procossing or treatment process.Wherein, it is known as exposure mask or template for the specific image of covering or object.In digitized map As in processing, exposure mask can be two-dimensional matrix array, or multivalue image, image masks are mainly used for: one extracts sense Interest region.It is multiplied with the area-of-interest exposure mask of pre-production with image to be processed, obtains region of interest area image, felt emerging Image value remains unchanged in interesting region, and image value is zero outside region；Secondly, shielding action.I.e. with exposure mask to figure to be processed As upper some regions make screen, so that it is not involved in processing, be not involved in the calculating of processing parameter or only covered region is dealt with, Statistics；Thirdly, structure feature extract.Detected and extracted with similitude template or image matching method in image to be processed with cover The similar structure feature of film；Four, the image of special shape is made.

According to the credibility of human body attitude key point, human body attitude is generated into human body attitude confidence map with reference to figure, specifically may be used If credible to include: human body attitude key point, mask artwork is generated centered on human body attitude key point, as human body appearance State confidence map, if may include: that human body attitude key point is credible, centered on human body attitude key point, and using high This karyogenesis mask artwork, as human body attitude confidence map.It should be noted that can be determined by the way that the parameter of Gaussian kernel is arranged Mask artwork institute influence area, wherein the parameter of Gaussian kernel includes the width and height of filter window, and Gaussian kernel can be high for two dimension This core.Illustratively, if certain Gaussian kernel is two-dimensional Gaussian kernel, the parameter of the two-dimensional Gaussian kernel is that the width of filter window is 7, high Degree be 7, i.e., mask artwork institute influence area be 7 × 7 square region.

It should be noted that if human body attitude key point is insincere, it can be using pre-set image data as human body appearance Pre-set image data can also be considered a kind of mask artwork by state confidence map.Pre-set image data described here with it is described previously Pre-set image data it is identical, pre-set image data can be all black picture, if being indicated with matrix sheet form, as full zero moment Battle array.

Optionally, based on the above technical solution, judge whether human body attitude key point is credible, specifically can wrap It includes: judging whether the corresponding probability value of human body key point is greater than preset threshold value.If so, determining that human body key point is credible.If It is no, it is determined that human body key point is insincere.

In an embodiment of the present invention, it should be noted that threshold value can be set according to the actual situation, not made herein It is specific to limit.In addition, the corresponding threshold value of different human body posture key point may be the same or different, it specifically can also basis Actual conditions are determined, and are not specifically limited herein, such as important human body attitude key point, settable biggish threshold Value, for unessential human body attitude key point, settable lesser threshold value.Illustratively, as being when human body attitude key point When the crown, corresponding threshold value is 0.9, and when human body attitude key point is left knee, corresponding threshold value is 0.5.

Step 150 judges whether current frame image data are last frame image data；If it is not, thening follow the steps 160； If so, thening follow the steps 170.

Human body attitude confidence map is input in human body attitude detection model by step 160, generates next frame figure for participating in As the human body attitude confidence map of data.

Step 170 terminates to execute the operation for the human body attitude confidence map for generating multiple image data.

In an embodiment of the present invention, judge whether current frame image data are last frame image data, if currently Frame image data is not last frame image data, then the human body attitude confidence map of current frame image data can be input to human body In attitude detection model, the reference of the output result as next frame image data, to improve the output of next frame image data Next frame image data is input in human body attitude detection model trained in advance by precision as a result, to refer to present frame The human body attitude confidence map of image data exports multiple human body attitudes of next frame image data with reference to figure, joins in human body attitude Identification human body attitude key point in figure is examined, according to the credibility of human body attitude key point, generates human body attitude confidence map.

It should be noted that showing to terminate to execute if current frame image data are last frame image data The operation for generating the human body attitude confidence map of multiple image data, without obtained human body attitude confidence map is input to people again In body attitude detection model.On this basis it will be appreciated that, if current frame image data are last frame image data, Step 120, step 130 can then be only carried out and judge whether human body attitude key point is credible, if insincere, by upper one Corresponding human body attitude key point is as human body attitude key point in frame image data.It will be understood, it is every to pass through step 120, step 130 and judge whether human body attitude key point is credible, it, will be corresponding in previous frame image data if insincere Human body attitude key point as human body attitude key point, it is crucial that the corresponding human body attitude of current frame image data can be obtained Point.

It should be noted step 120- step 150, be the treatment process for current frame image data, accordingly , step 120 and the human body attitude in step 130 refer to the corresponding human body attitude reference of current frame image data with reference to figure Figure, step 130 and the human body attitude key point in step 140 refer to that the corresponding human body attitude of current frame image data is crucial Point and step 140 and the human body attitude confidence map in step 150 refer to that the corresponding human body attitude of current frame image data is set Letter figure.

Based on above-mentioned, what is indicated due to current frame image data is certain frame image data being presently processing, such as The first frame image data of fruit is certain frame image data being presently processing, then can be using the first frame image data as present frame Image data；It, can be by the second frame picture number if the second frame image data is certain frame image data being presently processing According to as current frame image data, and so on.In other words, current frame image data can be first frame image data, the Two frame image datas, third frame image data ..., N-1 frame image data or nth frame image data.

Assuming that video includes N frame image data, N >=1, if it is determined that current frame image data are not nth frame image datas, Step 120-140 can be then repeated, and then the processing completed to the first frame image data to N-1 frame image data is grasped Make；If it is determined that current frame image data are nth frame image data, then step 120- step 130 and if not can be executed It is credible, then using human body attitude key point corresponding in previous frame image data as human body attitude key point.

Current frame image data are input to preparatory instruction by acquiring multiple image data by the technical solution of the present embodiment In experienced human body attitude detection model, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported With reference to figure, human body attitude detection model is the convolutional neural networks training generation for being applied to embedded platform, in human body attitude With reference to human body attitude key point is identified in figure, according to the credibility of human body attitude key point, human body attitude confidence map, judgement are generated Whether current frame image data are last frame image data, if it is not, human body attitude confidence map is then input to human body attitude inspection It surveys in model, for participating in generating the human body attitude confidence map of next frame image data, generates multiframe figure if so, terminating to execute As the operation of the human body attitude confidence map of data, realize in the enterprising pedestrian's body attitude detection of embedded platform, meanwhile, by upper one During the output result of frame image data is introduced to the prediction of the output result of current frame image data, further improve pre- Survey precision.

Optionally, based on the above technical solution, human body attitude detection model includes main road, the first branch and second Branch, main road include residual error module and up-sampling module, and the first branch includes refining network module, and second branch includes feedback mould Block.

Current frame image data are input in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, can specifically include: current frame image data are input to Residual error module is handled, and the human body attitude confidence map of previous frame image data is input at feedback module with reference Reason, obtains the first convolution results.The first convolution results that residual error module is exported are separately input into up-sampling module and refine net Network module is handled, and the second convolution results and third convolution results are respectively obtained.By the second convolution results and third convolution knot Fruit is added, and exports multiple human body attitudes with reference to figure.

In an embodiment of the present invention, residual error module can be used for extracting the features such as edge and the profile of image data, and Up-sampling module can be used for extracting the contextual information of image data.Refine network module is used to export residual error module the One convolution results are handled, and the first convolution results can be considered to network intermediate layer information, i.e. refinement network module is utilized Network intermediate layer information increases it and returns gradient, and then improves the precision of prediction of convolutional neural networks.Feedback module is used for The human body attitude confidence map of previous frame image data is introduced into convolutional neural networks, current frame image data is improved and exports result Precision.

Current frame image data are input to residual error module to handle, to refer to the human body appearance of previous frame image data State confidence map is input to feedback module and is handled, and obtains the first convolution as a result, can understand as follows: by the current frame image Data are input to the residual error module and handle and be input to the human body attitude confidence map of the previous frame image data Results added that the feedback module carries out that treated, obtains the first convolution results.

The first convolution results that residual error module is exported are separately input into up-sampling module and refine at network module Reason, obtains the second convolution results and third convolution results, then the second convolution results are added with third convolution results, exports multiple Human body attitude is with reference to figure, wherein up-sampling module can specifically use arest neighbors interpolation method, and other up-sampling sides can also be used Method can specifically be set according to the actual situation, is not specifically limited herein.

Network intermediate layer information is utilized by refining network module, increases it and returns gradient, and then improve convolution The precision of prediction of neural network.The human body attitude confidence map of previous frame image data is introduced into convolutional Neural net by feedback module In network, prediction of the human body attitude detection model to current frame image data is participated in, also improves the prediction essence of convolutional neural networks Degree.

Optionally, based on the above technical solution, residual error module include the first residual unit, the second residual unit and Third residual unit.

Current frame image data are input to residual error module to handle, to refer to the human body appearance of previous frame image data State confidence map is input to feedback module and is handled, and obtains the first convolution as a result, can specifically include: by current frame image data It is input to the first residual unit to be handled, obtains the first intermediate result.First intermediate result is input to the second residual unit Carry out processing and the human body attitude confidence map of previous frame image data be input to feedback module to carry out treated result phase Add, obtains the second intermediate result.Second intermediate result is input to third residual unit to handle, obtains knot among third Fruit, as the first convolution results.Wherein, the port number of the first intermediate result, the second result and third result successively increases.

In an embodiment of the present invention, residual error module can specifically include the first residual unit, the second residual unit and Three residual units, wherein each residual unit is made of ShuffleNet subelement and ShuffleNet down-sampling subelement, Wherein, ShuffleNet subelement may be implemented to operate the image data of arbitrary dimension, by two state modulators, divide Depth and output depth Wei not inputted, wherein input depth representing is the number of plies for inputting network intermediate features layer, exports depth Refer to the number of plies of the exported intermediate features layer of the subelement, the number of plies is corresponding with port number, and ShuffleNet subelement is extracted The feature of higher level, while the information of original level is remained, the size for not changing image data may be implemented, only change The depth for becoming network intermediate features layer can be regarded as advanced " convolutional layer " for keeping size constant.Wherein, exist In convolutional neural networks, port number refers to the number of convolution kernel in each convolutional layer.In addition, it should be noted that, each residual error Unit can only include a ShuffleNet subelement, include three ShuffleNet compared to original each residual unit For subelement, network structure is simplified, correspondingly, also just reducing calculation amount, improves treatment effeciency.

Pass through ShuffleNet down-sampling subelement in the first residual unit, the second residual unit and third residual unit Successively handle, so that the size of the first intermediate result, the second intermediate result and third intermediate result successively becomes smaller, meanwhile, in order to The constant of network size is kept, the port number of the first intermediate result, the port number of the second intermediate result and third intermediate result are made Port number successively increase.In addition, the corresponding characteristic pattern in each channel.

It should be noted that intermediate result can be indicated with W × H × K, wherein W indicates that the width of intermediate result, H indicate The length of intermediate result, K indicate that port number, W × H are the size for indicating intermediate result.It, can for input image data To be expressed as W × H × D, wherein W and H is identical as aforementioned meaning, and D indicates depth, illustratively, if input image data is RGB image, then D=3, if input image data is gray level image, D=1.

Illustratively, as the first intermediate result, the second intermediate result and third intermediate result M × N × K are indicated, M, N and As hereinbefore, the first intermediate result is 64 × 32 × 32 to the meaning of K, and the second intermediate result is 32 × 16 × 64, among third It as a result is 16 × 8 × 128.Based on above-mentioned it is found that the size of the first intermediate result is 64 × 32, the size of the second intermediate result is 32 × 16, the size of third intermediate result is 16 × 8, above-mentioned to show among the first intermediate result, the second intermediate result and third As a result size successively becomes smaller.Meanwhile first the port number of intermediate result be 32, the port number of the second intermediate result is 64, the The port number of three intermediate results is 128, above-mentioned to show the logical of the first intermediate result, the second intermediate result and third intermediate result Road number successively increases.

Optionally, based on the above technical solution, human body attitude detection model specifically can also include third branch.

The first convolution results that residual error module is exported are separately input into up-sampling module and refine at network module Reason, respectively obtains the second convolution results and third convolution results, can specifically include: the first intermediate result is input to third branch Road is handled, and the 4th intermediate result is obtained.Second intermediate result is input to third branch to handle, is obtained among the 5th As a result.Third intermediate result and the 5th intermediate result are input to up-sampling module to handle, obtain the 6th intermediate result.It will 4th intermediate result and the 6th intermediate result are input to up-sampling module and are handled, and the 7th intermediate result are obtained, as second Convolution results.The first convolution results that residual error module is exported are input to refinement network module and handle, and obtain third convolution As a result.Wherein, the port number of the 6th intermediate result and the 7th intermediate result is successively reduced.

In an embodiment of the present invention, human body attitude detection model specifically can also include third branch, third branch institute Play the role of being: by realizing third branch and moving to the convolution operation for jumping connection on main road, thus further Improve the precision of prediction of human body attitude detection model.Third branch can specifically include 1 × 1 convolution core module, batch standardization Module and linear activation primitive module.Wherein, 1 × 1 convolution kernel can function as follows, specific:

Situation one is directed to for single channel and single convolution kernel, and 1 × 1 convolution kernel is carried out to input image data Scaling, this is because only one parameter of 1 × 1 convolution kernel, this convolution kernel slide on input image data, just quite In to input image data multiplied by a coefficient；

Situation two is directed to for multichannel and multiple convolution kernels, and 1 × 1 convolution kernel has following both sides effect: its One, realize interaction and information integration across channel；Secondly, carry out dimensionality reduction and rise tie up and reduce network parameter, drop described here Dimension refers to reducing port number, rises dimension and refer to increasing port number；Thirdly, be significantly increased under the premise of not losing resolution ratio it is non- Linear characteristic.

Batch standardized module is for carrying out batch standardization, wherein batch standardization (or batch normalizes) be in order to The neural network number of plies is overcome to deepen, convergence rate is slack-off, and caused gradient disappears or gradient explosion, specifically can be by using It criticizes standardization and comes the certain layers of specification or all layers of input, thus the mean value and variance of fixed every layer of input signal, so that often One layer of input has a stable distribution.More specifically: it is commonly used in front of activation primitive, standardizes to x=W+b Change, make the mean value 0 for exporting result, variance 1, wherein W indicates that weight matrix, b indicate biasing.It is understood that in convolution In neural network, weight matrix refers to that convolution kernel, i.e. W indicate convolution kernel.

Since the 7th intermediate result is obtained after the 6th intermediate result and the 4th intermediate result to be input to up-sampling module , therefore, the size of the 7th intermediate result is greater than the size of the 6th intermediate result, meanwhile, in order to keep network size constant, make The port number of 6th intermediate result and the port number of the 7th intermediate result are successively reduced.

By realizing third branch and moving to the convolution operation for jumping connection on main road, to further improve The precision of prediction of human body attitude detection model.In addition, the first intermediate result, the second intermediate result and third intermediate result can be managed Coded portion is solved, the 6th intermediate result and the 7th intermediate result are interpreted as decoded portion, in order to keep network size constant, Coded portion successively increases the port number of intermediate result as the size of intermediate result reduces；In decoded portion, with centre As a result size increases, and successively reduces the port number of intermediate result.Furthermore, it is to be understood that arriving, provided by the embodiment of the present invention Convolutional neural networks are a kind of asymmetric encoding-decoding structures.

Optionally, based on the above technical solution, current frame image data are input to human body appearance trained in advance In state detection model, multiple human body attitudes are exported with reference to figure, specifically with the human body attitude confidence map with reference to previous frame image data It can also include: that the first convolution results and the second convolution results added are obtained into the second objective result.Multiple human body attitudes are joined It examines figure and the second objective result is added, export multiple new human body attitudes with reference to figure.Wherein, the second objective result is used for people When body attitude detection model is trained, the precision of human body attitude detection model is improved.

In an embodiment of the present invention, in order to improve human body attitude detection model in the precision of training stage, it may be considered that Increase midway to supervise, midway supervision refers to calculating loss in the output in each stage, it is ensured that bottom parameter is normally more Newly.

By the first convolution results and the second convolution results added, obtain the second objective result, then by the second objective result with Multiple human body attitudes are added with reference to figure, obtain multiple new human body attitudes with reference to figure, above-mentioned second objective result has been partway The effect of supervision, i.e. the second objective result also assist in the calculating process of loss.

It should be noted that in forecast period, can not execute the first convolution results and the second convolution results added Operation exports result and only includes multiple human body attitudes with reference to figure that is, in forecast period.

It should also be noted that, technical solution described in the embodiment of the present invention is not necessarily to after collecting multiple image data Carry out whether having face in detection image data, if there is face, then the position for detecting face where in image data, then The operation such as extracted, be without the reason of aforesaid operations: aforesaid operations take a long time, and testing result error is larger. It is understood that data-handling efficiency can be greatly improved when without aforesaid operations.

It is another it should be noted that due to the second residual unit and third residual unit by ShuffleNet subelement and ShuffleNet down-sampling subelement composition retains full size information, i.e., second before carrying out down-sampling every time on main road The ShuffleNet down-sampling subelement of residual unit is input to the second residual error before carrying out down-sampling, by the first intermediate result Unit；Third residual unit ShuffleNet down-sampling subelement before carrying out down-sampling, the second intermediate result is defeated Enter to third residual unit.Twice between down-sampling, feature, i.e. the first residual error list are extracted using a ShuffleNet subelement Feature is extracted using a ShuffleNet subelement between member and the second residual unit, which is first The ShuffleNet subelement of residual unit；ShuffleNet is used between second residual unit and third residual unit Unit extracts feature, i.e. extracts spy using a ShuffleNet subelement between the second residual unit and third residual unit Sign, the ShuffleNet subelement are the ShuffleNet subelement of the second residual unit.

Convolutional neural networks provided by the embodiment of the present invention, which introduce, to be refined network module, feedback module and will jump The convolution operation of connection moves on main road, the above-mentioned precision of prediction for improving convolutional neural networks.In addition, using asymmetric volume Code-decoding structure ensure that network size is basically unchanged, since each residual unit only includes that ShuffleNet is single Member simplifies network structure, accordingly for original each residual unit includes three ShuffleNet subelements , also just reduce calculation amount, improves treatment effeciency.Based on above-mentioned, so that the human body attitude based on convolutional neural networks is examined Survey method can be applied to embedded platform, as smart phone embedded platform on, and real time execution and precision of prediction can To meet the requirements.

Convolutional neural networks provided by embodiment in order to better understand the present invention, are said below with specific example It is bright, specific:

As shown in Fig. 2, being a kind of application schematic diagram of convolutional neural networks, which, which has, may include: Main road, the first branch, second branch and third branch, wherein main road include the first convolution module 21, the first residual unit 22, Second residual unit 23, third residual unit 24, the second convolution module 25, up-sampling module 26, addition without carry module 27 and third Convolution module 28, wherein the first residual unit 22, the second residual unit 23 and third residual unit 24 include ShuffleNet Down-sampling subelement 221 and ShuffleNet subelement 222, the first branch include refining network module 29, wherein refine network Module 29 includes ShuffleNet subelement 222 and up-sampling module 26, and second branch includes feedback module 30, third branch packet Include the second convolution module 25.

It should be noted that W × the H marked in module, unit or sub-unit × K, indicates to pass through the module, unit or son The result obtained after cell processing, wherein W indicates that the width of result, H indicate the length of result, and K indicates port number.

It should also be noted that, the first convolution module 21 includes following processing operation: the first step, convolution operation, specific institute The size of the convolution kernel used is 3 × 3；Second step, batch standardization；Third step, linear activation primitive.Second convolution module 25 packet Following processing operation: the first step, convolution operation is included, specifically uses the size of institute's convolution kernel for 1 × 1；Second step, batch standardization； Third step, linear activation primitive.Third convolution module 26 includes following processing operation: the first step, convolution operation are specifically used Convolution kernel size be 1 × 1；Second step, batch standardization；Third step, linear activation primitive；4th step, convolution operation, specifically The size of the convolution kernel used is 3 × 3.

Assuming that the RGB image that current frame image data are 256 × 128 × 3, is input to convolution mind as input variable Through successively after the first convolution module 21 and the first residual unit 22, obtaining the first intermediate result, knot among first in network Fruit is 64 × 32 × 32, and the first intermediate result is input to the second residual unit 23 and is handled, and by previous frame picture number According to human body attitude confidence map be input to feedback module 30 and carry out treated results added, obtain the second intermediate result, second Intermediate result is 32 × 16 × 64, and the second intermediate result is input to third residual unit 24 and is handled, is obtained among third As a result, using third intermediate result as the first convolution as a result, the first convolution results are 16 × 8 × 128.It should be noted that anti- Presenting module 30 may include 1 × 1 convolution kernel, for rising dimension, this is because the human body attitude confidence map of previous frame image data is 64 × 32 × 14, and the first intermediate result is 64 × 32 × 32, needs a liter dimension, to guarantee that the two output channel number is consistent.

The second convolution module 25 that first intermediate result is input to third branch is handled, knot among the 4th is obtained Fruit, the 4th intermediate result are 64 × 32 × 32.

The second convolution module 25 that second intermediate result is input to third branch is handled, knot among the 5th is obtained Fruit, the 5th intermediate result are 32 × 16 × 32.

After the second convolution module 25 and up-sampling module 26 that third intermediate result is input on main road are handled The result and the 5th intermediate result arrived, the addition without carry module 27 being input on main road jointly are handled, and knot among the 6th is obtained Fruit, the result and the 4th intermediate result that the up-sampling module 26 that the 6th intermediate result is input on main road is handled, The addition without carry module 27 being input on main road jointly is handled, and the 7th intermediate result is obtained, using the 7th intermediate result as Two convolution results, the second convolution results are 64 × 32 × 32.

The second convolution module 25 that third intermediate result is input on main road handled as a result, being input to again ShuffleNet subelement 222 in the first branch is handled, and the 8th intermediate result is obtained, and the 8th intermediate result is input to Up-sampling module 26 in the first branch is handled, and obtains the 9th intermediate result, then the 9th intermediate result is input to first The ShuffleNet subelement 222 of branch road is handled, and the tenth intermediate result is obtained, and the tenth intermediate result is input to first The up-sampling module 26 of branch road is handled, and the 11st intermediate result is obtained.6th intermediate result is input to the first branch On ShuffleNet subelement 222 handled, obtain the 12nd intermediate result, the 12nd intermediate result be input to first The up-sampling module 26 of branch road is handled, and the 13rd intermediate result is obtained, among the 11st intermediate result and the 13rd Results added, obtains third convolution results, and third convolution results are 64 × 32 × 32.

Second convolution results and third convolution results are input to the addition without carry module 27 on main road, are obtained among the 14th As a result, the 14th intermediate result is input to the ShuffleNet subelement 222 on main road, the 15th intermediate result is obtained, the 15 intermediate results are 64 × 32 × 32, and the 15th intermediate result is input to the third convolution module 28 on main road, is exported more Human body attitude is opened with reference to figure.

By the first convolution results and the second convolution results added, the second objective result is obtained, the second objective result is 64 × 32×14.Multiple human body attitudes are added with reference to figure and the second objective result, export multiple new human body attitudes with reference to figure.Its In, the second objective result is used for when being trained to human body attitude detection model, improves the precision of human body attitude detection model.

It should be noted that since the human body attitude confidence map of previous frame image data is not when starting, just and currently Frame image data is input in convolutional neural networks as input variable, but interbed and the first intermediate result conduct in a network Input variable is input in convolutional neural networks, above-mentioned to realize reduction data processing amount.

Fig. 3 is the flow chart of another human body attitude detection method provided in an embodiment of the present invention, and the present embodiment is applicable In detect human body attitude the case where, this method can be executed by human body attitude detection device, the device can using software and/ Or the mode of hardware is realized, which can be configured in equipment, such as typically computer or mobile terminal etc..Such as Fig. 3 Shown, this method specifically comprises the following steps:

Step 301, acquisition multiple image data.

Step 302 judges whether the human body attitude confidence map of previous frame image data is credible；If so, thening follow the steps 303；If it is not, thening follow the steps 304.

The human body attitude confidence map of current frame image data and previous frame image data is input to preparatory instruction by step 303 In experienced human body attitude detection model, multiple human body attitudes are exported with reference to figure, and are transferred to and are executed step 305.

Current frame image data and pre-set image data are input to human body attitude detection figure trained in advance by step 304 In, multiple human body attitudes are exported with reference to figure, and are transferred to and are executed step 305.

Step 305, in human body attitude with reference to the coordinate position for determining most probable value in figure, using coordinate position as human body Posture key point.

Step 306 judges whether the corresponding probability value of human body attitude key point is greater than preset threshold value；If so, executing Step 307；If it is not, thening follow the steps 308.

Step 307 generates mask artwork centered on human body attitude key point, as human body attitude confidence map, and is transferred to and holds Row step 309.

Step 308, using pre-set image data as human body attitude confidence map, and be transferred to execute step 309.

Step 309 judges whether current frame image data are last frame image data；If it is not, thening follow the steps 310； If so, thening follow the steps 311.

Human body attitude confidence map is input in human body attitude detection model by step 310, generates next frame figure for participating in As the human body attitude confidence map of data.

Step 311 terminates to execute the operation for the human body attitude confidence map for generating multiple image data.

In an embodiment of the present invention, it should be noted that human body attitude detection model provided by the embodiment of the present invention It is generated for the convolutional neural networks training through being applied to embedded platform.

Fig. 4 is a kind of structural schematic diagram of human body attitude detection device provided in an embodiment of the present invention, and the present embodiment can fit The case where for detecting human body attitude, the device can realize that the device can be configured at by the way of software and/or hardware In equipment, such as typically computer or mobile terminal etc..As shown in figure 4, the device specifically includes:

Image data acquiring module 410, for acquiring multiple image data.

Human body attitude is with reference to figure output module 420, for current frame image data to be input to human body appearance trained in advance In state detection model, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, human body Attitude detection model is the convolutional neural networks training generation for being applied to embedded platform.

Human body attitude key point identification module 430 is used in human body attitude with reference to identification human body attitude key point in figure.

Human body attitude confidence map generation module 440 generates human body attitude for the credibility according to human body attitude key point Confidence map.

Judgment module 450, for judging whether current frame image data are last frame image data.

First execution module 460, for if it is not, then human body attitude confidence map is input in human body attitude detection model, For participating in generating the human body attitude confidence map of next frame image data.

Second execution module 470, for if so, terminating to execute the human body attitude confidence map for generating multiple image data Operation.

Optionally, based on the above technical solution, human body attitude can specifically include with reference to figure output module 420:

Confidence map credibility judging unit, whether the human body attitude confidence map for judging previous frame image data is credible.

First human body attitude reference figure output unit, for if so, by current frame image data and previous frame picture number According to human body attitude confidence map be input in advance trained human body attitude detection model, export multiple human body attitudes with reference to figure.

Second human body attitude is with reference to figure output unit, for if it is not, then by current frame image data and pre-set image data It is input in human body attitude detection model trained in advance, exports multiple human body attitudes with reference to figure.

Optionally, based on the above technical solution, human body attitude key point identification module 430, can specifically include:

Human body attitude key point recognition unit, for the coordinate bit in human body attitude with reference to most probable value determining in figure It sets, using coordinate position as human body attitude key point.

Optionally, based on the above technical solution, human body attitude confidence map generation module 440, can specifically include:

Human body attitude key point credibility judging unit, for judging whether human body attitude key point is credible.

First human body attitude confidence map generation unit, for being covered if so, being generated centered on human body attitude key point Mould figure, as human body attitude confidence map；

Second human body attitude confidence map generation unit, for if it is not, then using pre-set image data as human body attitude confidence Figure.

Optionally, based on the above technical solution, human body attitude key point credibility judging unit can specifically be used In:

Judge whether the corresponding probability value of human body key point is greater than preset threshold value.

If so, determining that human body attitude key point is credible.

If not, it is determined that human body attitude key point is insincere.

Current frame image data are input in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, can specifically include:

Current frame image data are input to residual error module to handle, to refer to the human body appearance of previous frame image data State confidence map is input to feedback module and is handled, and obtains the first convolution results.

The first convolution results that residual error module is exported are separately input into up-sampling module and refine at network module Reason, respectively obtains the second convolution results and third convolution results.

Second convolution results are added with third convolution results, export multiple human body attitudes with reference to figure.

Current frame image data are input to residual error module to handle, to refer to the human body appearance of previous frame image data State confidence map is input to feedback module and is handled, and obtains the first convolution as a result, can specifically include:

Current frame image data are input to the first residual unit to handle, obtain the first intermediate result.

First intermediate result is input to the second residual unit and carries out processing and by the human body appearance of previous frame image data State confidence map is input to feedback module and carries out treated results added, obtains the second intermediate result.

Second intermediate result is input to third residual unit to handle, third intermediate result is obtained, as the first volume Product result.

Wherein, the port number of the first intermediate result, the second intermediate result and third intermediate result successively increases.

The first convolution results that residual error module is exported are separately input into up-sampling module and refine at network module Reason, respectively obtains the second convolution results and third convolution results, can specifically include:

First intermediate result is input to third branch to handle, obtains the 4th intermediate result.

Second intermediate result is input to third branch to handle, obtains the 5th intermediate result.

Third intermediate result and the 5th intermediate result are input to up-sampling module to handle, obtain knot among the 6th Fruit.

4th intermediate result and the 6th intermediate result are input to up-sampling module to handle, obtain knot among the 7th Fruit, as the second convolution results.

The first convolution results that residual error module is exported are input to refinement network module and handle, and obtain third convolution knot Fruit.

Wherein, the port number among the 6th centre and the 7th is successively reduced.

Optionally, based on the above technical solution, current frame image data are input to human body appearance trained in advance In state detection model, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, specifically Can also include:

By the first convolution results and the second convolution results added, the second objective result is obtained.

Multiple human body attitudes are added with reference to figure and the second objective result, export multiple new human body attitudes with reference to figure.

Wherein, the second objective result is used for when being trained to human body attitude detection model, improves human body attitude detection The precision of model.

People provided by any embodiment of the invention can be performed in human body attitude detection device provided by the embodiment of the present invention Body attitude detecting method has the corresponding functional module of execution method and beneficial effect.

Fig. 5 is a kind of structural schematic diagram of equipment provided in an embodiment of the present invention.Fig. 5, which is shown, to be suitable for being used to realizing this hair The block diagram of the example devices 512 of bright embodiment.The equipment 512 that Fig. 5 is shown is only an example, should not be to of the invention real The function and use scope for applying example bring any restrictions.

As shown in figure 5, equipment 512 is showed in the form of common apparatus.The component of equipment 512 can include but is not limited to: One or more processor 516, system storage 528 are connected to different system components (including system storage 528 and place Manage device 516) bus 518.

Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Equipment 512 typically comprises a variety of computer system readable media.These media can be it is any can be by equipment The usable medium of 512 access, including volatile and non-volatile media, moveable and immovable medium.

System storage 528 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 530 and/or cache memory 532.Equipment 512 may further include other removable/not removable Dynamic, volatile/non-volatile computer system storage medium.Only as an example, storage system 534 can be used for read and write can not Mobile, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 5, Ke Yiti For the disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to moving non-volatile light The CD drive of disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver It can be connected by one or more data media interfaces with bus 518.Memory 528 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention each The function of embodiment.

Program/utility 540 with one group of (at least one) program module 542, can store in such as memory In 528, such program module 542 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 542 Usually execute the function and/or method in embodiment described in the invention.

Equipment 512 can also be logical with one or more external equipments 514 (such as keyboard, sensing equipment, display 524 etc.) Letter, can also be enabled a user to one or more equipment interact with the equipment 512 communicate, and/or with make the equipment 512 Any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicates.This Kind communication can be carried out by input/output (I/O) interface 522.Also, equipment 512 can also by network adapter 520 with One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as Shown in figure, network adapter 520 is communicated by bus 518 with other modules of equipment 512.It should be understood that although not showing in Fig. 5 Out, other hardware and/or software module can be used with bonding apparatus 512, including but not limited to: microcode, device driver, superfluous Remaining processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processor 516 by the program that is stored in system storage 528 of operation, thereby executing various function application and Data processing, such as realize a kind of human body attitude detection method provided by the embodiment of the present invention, comprising:

Acquire multiple image data.

Current frame image data are input in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, human body attitude detection model be applied to it is embedded flat The convolutional neural networks training of platform generates.

In human body attitude with reference to identification human body attitude key point in figure.

According to the credibility of human body attitude key point, human body attitude confidence map is generated.

Judge whether current frame image data are last frame image data.

If it is not, then human body attitude confidence map is input in human body attitude detection model, next frame figure is generated for participating in As the human body attitude confidence map of data.

Certainly, it will be understood by those skilled in the art that processor can also realize that any embodiment of that present invention provides answers The technical solution of human body attitude detection method for equipment.The hardware configuration and function of the equipment can be found in the interior of embodiment Hold and explains.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey A kind of human body attitude detection method as provided by the embodiment of the present invention is realized when sequence is executed by processor, this method comprises:

Acquire multiple image data.

Judge whether current frame image data are last frame image data.

If so, terminating the operation of the human body attitude confidence map of execution multiple image data.

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, computer executable instructions are not It is limited to method operation as described above, the human body attitude detection side of equipment provided by any embodiment of the invention can also be performed Relevant operation in method.It can be found in the content in embodiment to the introduction of storage medium to explain.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of human body attitude detection method characterized by comprising

Acquire multiple image data；

Current frame image data are input in human body attitude detection model trained in advance, with reference to previous frame image data Human body attitude confidence map exports multiple human body attitudes with reference to figure, and the human body attitude detection model is to be applied to embedded put down The convolutional neural networks training of platform generates；

According to the credibility of the human body attitude key point, human body attitude confidence map is generated；

Judge whether current frame image data are last frame image data；

If it is not, then the human body attitude confidence map is input in the human body attitude detection model, it is next for participating in generating The human body attitude confidence map of frame image data；

2. the method according to claim 1, wherein described be input to preparatory instruction for the current frame image data In experienced human body attitude detection model, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported With reference to figure, comprising:

If so, the human body attitude confidence map of the current frame image data and the previous frame image data is input in advance In trained human body attitude detection model, multiple human body attitudes are exported with reference to figure；

If it is not, the current frame image data and pre-set image data to be then input to human body attitude detection model trained in advance In, multiple human body attitudes are exported with reference to figure.

3. the method according to claim 1, wherein it is described in the human body attitude with reference to identifying human body appearance in figure State key point, comprising:

In the human body attitude with reference to the coordinate position of most probable value determining in figure, using the coordinate position as human body attitude Key point.

4. according to the method described in claim 2, it is characterized in that, the credibility according to the human body attitude key point, Generate human body attitude confidence map, comprising:

Judge whether the human body attitude key point is credible；

5. according to the method described in claim 4, it is characterized in that, described judge whether the human body attitude key point is credible, Include:

If so, determining that the human body attitude key point is credible；

If not, it is determined that the human body attitude key point is insincere.

6. -5 any method according to claim 1, which is characterized in that the human body attitude detection model include main road, The first branch and second branch, the main road include residual error module and up-sampling module, and the first branch includes refining network Module, the second branch include feedback module；

It is described to be input to current frame image data in human body attitude detection model trained in advance, to refer to previous frame picture number According to human body attitude confidence map, export multiple human body attitudes with reference to figure, comprising:

Current frame image data are input to the residual error module to handle, to refer to the human body appearance of previous frame image data State confidence map is input to the feedback module and is handled, and obtains the first convolution results；

The first convolution results that the residual error module exports are separately input into the up-sampling module and the refinement network mould Block is handled, and the second convolution results and third convolution results are respectively obtained；

7. according to the method described in claim 6, it is characterized in that, the residual error module include the first residual unit, it is second residual Poor unit and third residual unit；

It is described current frame image data are input to the residual error module to handle, with reference to by the people of previous frame image data Body posture confidence map is input to the feedback module and is handled, and obtains the first convolution as a result, including:

First intermediate result is input to second residual unit to handle, and by the previous frame image data Human body attitude confidence map be input to the feedback module results added that carries out that treated, obtain the second intermediate result；

Second intermediate result is input to the third residual unit to handle, third intermediate result is obtained, as institute State the first convolution results；

Wherein, the port number of first intermediate result, second intermediate result and the third intermediate result successively increases.

8. the method according to the description of claim 7 is characterized in that the human body attitude detection model further includes third branch；

First convolution results by residual error module output are separately input into the up-sampling module and the refinement net Network module is handled, and the second convolution results and third convolution results are respectively obtained, comprising:

The third intermediate result and the 5th intermediate result are input to the up-sampling module to handle, obtain the 6th Intermediate result；

4th intermediate result and the 6th intermediate result are input to the up-sampling module to handle, obtain the 7th Intermediate result, as second convolution results；

The first convolution results that the residual error module exports are input to the refinement network module and handled, obtain described the Three convolution results；

Wherein, the port number of the 6th intermediate result and the 7th intermediate result is successively reduced.

9. according to the method described in claim 6, it is characterized in that, described be input to training in advance for current frame image data In human body attitude detection model, the reference of multiple human body attitudes is exported with the human body attitude confidence map with reference to previous frame image data Figure, further includes:

Multiple described human body attitudes are added with reference to figure with second objective result, new multiple described human body attitudes ginseng is exported Examine figure；

Wherein, second objective result is used for when being trained to the human body attitude detection model, improves human body attitude The precision of detection model.

10. a kind of human body attitude detection device characterized by comprising

Image data acquiring module, for acquiring multiple image data；

Human body attitude detects mould with reference to figure output module, for current frame image data to be input to human body attitude trained in advance In type, with the human body attitude confidence map with reference to previous frame image data, multiple human body attitudes are exported with reference to figure, the human body attitude Detection model is the convolutional neural networks training generation for being applied to embedded platform；

Human body attitude confidence map generation module generates human body attitude and sets for the credibility according to the human body attitude key point Letter figure；

First execution module, for if it is not, then the human body attitude confidence map is input in the human body attitude detection model, For participating in generating the human body attitude confidence map of next frame image data；

Second execution module, for if so, terminating the operation of the human body attitude confidence map of execution generation multiple image data.

11. a kind of equipment characterized by comprising

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in claim 1-9 is any.

12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method as described in claim 1-9 is any is realized when execution.