CN110634155A

CN110634155A - A method and device for target detection based on deep learning

Info

Publication number: CN110634155A
Application number: CN201810641829.8A
Authority: CN
Inventors: 张立成; 鞠策
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2019-12-31

Abstract

The invention discloses a deep learning-based target detection method and device, and relates to the technical field of computers. A specific implementation of the method includes: using a deep learning network to determine the detection target in the initial frame image in each cycle, wherein, taking each consecutive number of frames in the continuously collected multi-frame images as a cycle, the first frame image in each cycle One frame of image is the start frame image; for each frame image except the start frame image in the cycle, use the characteristics of the detected target in the previous frame image, and carry out target tracking according to the dynamic tracking algorithm to determine the target frame except for the cycle. The detection target in each frame image other than the initial frame image; after the detection target in each frame image is determined, the determined detection target is output. This embodiment can greatly reduce the amount of calculation without reducing the recognition accuracy, thereby well meeting the needs of the field of unmanned driving or robotics, and is easy to deploy on hardware.

Description

A method and device for target detection based on deep learning

技术领域technical field

本发明涉及计算机技术领域，尤其涉及一种基于深度学习的目标检测方法和装置。The present invention relates to the field of computer technology, in particular to a deep learning-based target detection method and device.

背景技术Background technique

基于深度学习的目标检测方法目前广泛应用于时间关键的影像分析(Time-critical video analysis)场景，比如机器人导航和自动驾驶。这些方法将在每一帧图片上检测出该图像里面的车辆和行人，��计算��应用在��器人导航和自动驾驶领域的一个基本的技术。Object detection methods based on deep learning are currently widely used in time-critical video analysis (Time-critical video analysis) scenarios, such as robot navigation and autonomous driving. These methods will detect vehicles and pedestrians in each frame of the image, which is a basic technology for computer vision applications in the field of robot navigation and automatic driving.

现有的目标检测算法输出的边框都依靠Non-max Suppression(非极大值抑制)方法输出一个概率最大的边框。Non-max Suppression方法需要先计算检测到的边框和基准边框的IoU(Intersection-over-Union，交并比)值，从里面选取最大的作为这一帧输出的边框。The bounding boxes output by existing target detection algorithms rely on the Non-max Suppression (non-maximum value suppression) method to output a bounding box with the highest probability. The Non-max Suppression method needs to first calculate the IoU (Intersection-over-Union) value of the detected frame and the reference frame, and select the largest frame output frame from it.

在实现本发明过程中，发明人发现现有技术中至少存在如下问题：In the course of realizing the present invention, the inventor finds that there are at least the following problems in the prior art:

现有的目标检测方法对每一帧图像都要进行检测，计算量庞大，从而无法很好地满足无人驾驶或者机器人领域的需求。Existing target detection methods need to detect each frame of image, and the amount of calculation is huge, which cannot well meet the needs of the field of unmanned driving or robotics.

发明内容Contents of the invention

有鉴于此，本发明实施例提供一种基于深度学习的目标检测方法和装置，能够在不降低识别准确率的基础上大程度地降低计算量，从而很好地满足无人驾驶或者机器人领域的需求，且很容易在硬件上部署。In view of this, the embodiments of the present invention provide a method and device for target detection based on deep learning, which can greatly reduce the amount of calculation without reducing the recognition accuracy, so as to well meet the needs of unmanned driving or robotics. requirements, and is easy to deploy on hardware.

为实现上述目的，根据本发明实施例的一个方面，提供了一种基于深度学习的目标检测方法。In order to achieve the above object, according to an aspect of the embodiments of the present invention, a method for object detection based on deep learning is provided.

一种基于深度学习的目标检测方法，包括：利用深度学习网络，确定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像；对于周期内除起始帧图像之外��各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定所述周期内除起始帧图像之外的各帧图像中的检测目标；在每确定一帧图像中的检测目标之后，输出该确定的检测目标。A target detection method based on deep learning, including: using a deep learning network to determine the detection target in the initial frame image in each cycle, wherein, taking each consecutive number of frames in the continuously collected multi-frame images as a cycle, each The first frame image in the cycle is the start frame image; for each frame image in the cycle except the start frame image, use the characteristics of the detected target in the previous frame image, and perform target tracking according to the dynamic tracking algorithm to determine the The detection target in each frame image except the initial frame image in the cycle; after each detection target in a frame image is determined, the determined detection target is output.

可选地，所述动态追踪算法为粒子滤波算法。Optionally, the dynamic tracking algorithm is a particle filter algorithm.

可选地，对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定所述周期内除起始帧图像之外的各帧图像中的检测目标的步骤，包括：计算所述周期内的起始帧图像中检测目标的特征直方图；在所述周期内的第二帧图像平面上，按照预设规则部署粒子，并计算各粒子所在位置的特征直方图；根据所述各粒子所在位置的特征直方图与所述检测目标的特征直方图的相似度，确定所述第二帧图像中的检测目标；从所述周期内的第三帧图像开始，计算每帧的前一帧图像中检测目标的特征直方图，并根据所述每帧的前一帧图像中检测目标的位置，在每帧图像平面上部署粒子，然后计算所述每帧图像平面上各粒子所在位置的特征直方图，根据每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，确定所述每帧图像中的检测目标。Optionally, for each frame image in the period except the initial frame image, use the features of the detected target in the previous frame image, and perform target tracking according to the dynamic tracking algorithm, so as to determine the frame image in the period except the initial frame image The step of detecting the target in each frame image outside includes: calculating the feature histogram of the detection target in the initial frame image in the cycle; on the second frame image plane in the cycle, deploying according to preset rules Particles, and calculate the feature histogram of the position of each particle; according to the similarity between the feature histogram of the position of each particle and the feature histogram of the detection target, determine the detection target in the second frame image; from The third frame image in the period starts to calculate the feature histogram of the detected target in the previous frame image of each frame, and according to the position of the detected target in the previous frame image of each frame, on the image plane of each frame Deploy the particles, and then calculate the feature histogram of the position of each particle on the image plane of each frame, according to the similarity between the feature histogram of the position of each particle in each frame and the feature histogram of the detection target in the previous frame, determine the Describe the detection target in each frame of image.

可选地，所述深度学习网络为基于更快速基于图像区域的卷积神经网络或YOLO物体检测深度网络。Optionally, the deep learning network is based on a faster image region-based convolutional neural network or a YOLO object detection deep network.

根据本发明实施例的另一方面，提供了一种基于深度学习的目标检测装置。According to another aspect of the embodiments of the present invention, an object detection device based on deep learning is provided.

一种基于深度学习的目标检测装置，包括：检测模块，用于利用深度学习网络，确定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像；追踪模块，用于对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定所述周期内除起始帧图像之外的各帧图像中的检测目标；输出模块，用于在每确定一帧图像中的检测目标之后，输出该确定的检测目标。A target detection device based on deep learning, comprising: a detection module, configured to use a deep learning network to determine the detection target in the initial frame image in each cycle, wherein each consecutive number of frames in the continuously collected multi-frame images As a cycle, the first frame image in each cycle is the start frame image; the tracking module is used to use the features of the detected target in the previous frame image for each frame image in the cycle except the start frame image, according to the dynamic The tracking algorithm performs target tracking to determine the detection target in each frame image except the initial frame image in the period; the output module is used to output the determined detection target after each determination of the detection target in a frame image Target.

可选地，所述追踪模块还用于：计算所述周期内的起始帧图像中检测目标的特征直方图；在所述周期内的第二帧图像平面上，按照预设规则部署粒子，并计算各粒子所在位置的特征直方图；根据所述各粒子所在位置的特征直方图与所述检测目标的特征直方图的相似度，确定所述第二帧图像中的检测目标；从所述周期内的第三帧图像开始，计算每帧的前一帧图像中检测目标的特征直方图，并根据所述每帧的前一帧图像中检测目标的位置，在每帧图像平面上部署粒子，然后计算所述每帧图像平面上各粒子所在位置的特征直方图，根据每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，确定所述每帧图像中的检测目标。Optionally, the tracking module is further configured to: calculate the feature histogram of the detected target in the initial frame image within the period; deploy particles according to preset rules on the image plane of the second frame within the period, And calculate the feature histogram of the position of each particle; according to the similarity between the feature histogram of the position of each particle and the feature histogram of the detection target, determine the detection target in the second frame image; from the Starting from the third frame image in the cycle, calculate the feature histogram of the detected target in the previous frame image of each frame, and deploy particles on the image plane of each frame according to the position of the detected target in the previous frame image of each frame , and then calculate the feature histogram of the position of each particle on the image plane of each frame, and determine the feature histogram of each particle position according to the similarity between the feature histogram of the position of each particle in each frame and the feature histogram of the detection target in the previous frame. Detected objects in frame images.

根据本发明实施例的又一方面，提供了一种电子设备。According to yet another aspect of the embodiments of the present invention, an electronic device is provided.

一种电子设备，包括：一个或多个处理器；存储器，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现本发明提供的基于深度学习的目标检测方法。An electronic device, comprising: one or more processors; a memory for storing one or more programs, when the one or more programs are executed by the one or more processors, the one or more Multiple processors realize the object detection method based on deep learning provided by the present invention.

根据本发明实施例的又一方面，提供了一种计算机可读介质。According to yet another aspect of the embodiments of the present invention, a computer-readable medium is provided.

一种计算机可读介质，其上存储有计算机程序，所述程序被处理器执行时实现本发明提供的基于深度学习的目标检测方法。A computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the object detection method based on deep learning provided by the present invention is implemented.

上述发明中的一个实施例具有如下优点或有益效果：利用深度学习网络，确定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像；对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定周期内除起始帧图像之外的各帧图像中的检测目标。能够在不降低识别准确率的基础上大程度地降低计算量，从而很好地满足无人驾驶或者机器人领域的需求，且很容易在硬件上部署。An embodiment of the above-mentioned invention has the following advantages or beneficial effects: use the deep learning network to determine the detection target in the initial frame image in each cycle, wherein each consecutive number of frames in the continuously collected multi-frame images is taken as a cycle , the first frame image in each period is the initial frame image; for each frame image in the period except the initial frame image, use the characteristics of the detected target in the previous frame image, and perform target tracking according to the dynamic tracking algorithm to determine The detection target in each frame image except the initial frame image in the cycle. It can greatly reduce the amount of calculation without reducing the recognition accuracy, so as to well meet the needs of unmanned driving or robot fields, and is easy to deploy on hardware.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明，不构成对本发明的不当限定。其中：The accompanying drawings are used to better understand the present invention, and do not constitute improper limitations to the present invention. in:

图1是根据本发明实施例的基于深度学习的目标检测方法的主要步骤示意图；Fig. 1 is a schematic diagram of main steps of a target detection method based on deep learning according to an embodiment of the present invention;

图2是根据本发明实施例的基于深度学习的目标检测装置的主要模块示意图；2 is a schematic diagram of main modules of a deep learning-based target detection device according to an embodiment of the present invention;

图3是本发明实施例可以应用于其中的示例性系统架构图；FIG. 3 is an exemplary system architecture diagram to which an embodiment of the present invention can be applied;

��4��适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。Fig. 4 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明，其中包括本发明实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本发明的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

图1是根据本发明实施例的基于深度学习的目标检测方法的主要步骤示意图。FIG. 1 is a schematic diagram of main steps of a deep learning-based object detection method according to an embodiment of the present invention.

如图1所示，本发明实施例的基于深度学习的目标检测方法主要包括步骤S101至步骤S103。As shown in FIG. 1 , the object detection method based on deep learning in the embodiment of the present invention mainly includes steps S101 to S103.

步骤S101：利用深度学习网络，确定每一周期内起始帧图像中的检测目标。Step S101: Use the deep learning network to determine the detection target in the initial frame image in each cycle.

其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像。Wherein, each consecutive frame of multiple frames of images collected continuously is taken as a cycle, and the first frame image in each cycle is the initial frame image.

深度学习网络可以为基于Faster R-CNN(更快速基于图像区域的卷积神经网络)或YOLO物体检测深度网络，YOLO即You Only Look Once，是一种物体检测深度网络。The deep learning network can be based on Faster R-CNN (a faster convolutional neural network based on image regions) or YOLO object detection deep network. YOLO is You Only Look Once, which is a deep object detection network.

步骤S102：对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定该周期内除起始帧图像之外的各帧图像中的检测目标。Step S102: For each frame image in the period except the initial frame image, use the features of the detected target in the previous frame image to track the target according to the dynamic tracking algorithm to determine the target frame image in the period except the initial frame image Detection objects in each frame of image.

动态追踪算法具体可以为粒子滤波算法。Specifically, the dynamic tracking algorithm may be a particle filter algorithm.

检测目标的特征可以为检测目标对应的边框中，多种像素特征中的任意一种像素特征，例如颜色特征。The feature of the detection target may be any pixel feature among various pixel features in the frame corresponding to the detection target, such as a color feature.

步骤S102具体可以包括：计算一周期内的起始帧图像中检测目标的特征直方图；在该周期内的第二帧图像平面上，按照预设规则部署粒子，并计算各粒子所在位置的特征直方图；根据各粒子所在位置的特征直方图与起始帧图像中检测目标的特征直方图的相似度，确定第二帧图像中的检测目标；从该周期内的第三帧图像开始，计算每帧的前一帧图像中检测目标的特征直方图，并根据每帧的前一帧图像中检测目标的位置，在每帧图像平面上部署粒子，然后计算每帧图像平面上各粒子所在位置的特征直方图，根据每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，确定每帧图像中的检测目标。Step S102 may specifically include: calculating the feature histogram of the detected target in the initial frame image within one cycle; deploying particles according to preset rules on the second frame image plane within this cycle, and calculating the feature of the position of each particle Histogram; according to the similarity between the feature histogram of the position of each particle and the feature histogram of the detection target in the initial frame image, determine the detection target in the second frame image; start from the third frame image in this period, calculate The feature histogram of the detected target in the previous frame image of each frame, and according to the position of the detected target in the previous frame image of each frame, deploy particles on the image plane of each frame, and then calculate the position of each particle on the image plane of each frame According to the similarity between the feature histogram of the position of each particle in each frame and the feature histogram of the detection target in the previous frame, the detection target in each frame of image is determined.

在该周期内的第二帧图像平面上，按照预设规则部署粒子时，预设规则可以为，在该第二帧图像平面的各位置均匀部署粒子，或者，在该第二帧图像的平面上、且与起始帧图像中检测目标对应的位置附近部署较多的粒子，并且，在该第二帧图像的平面上、且远离与起始帧图像中检测目标对应的位置部署较少的粒子，其中上述部署粒子的位置距离与起始帧图像检测目标对应位置的远近的衡量标准，以及部署粒子数量多少的衡量标准，均可以根据需要设定。On the image plane of the second frame in this period, when the particles are deployed according to the preset rules, the preset rules can be that the particles are uniformly deployed at each position of the image plane of the second frame, or that the particles are deployed on the plane of the second image frame More particles are deployed on and near the position corresponding to the detection target in the initial frame image, and fewer particles are deployed on the plane of the second frame image and away from the position corresponding to the detection target in the initial frame image Particles, wherein the measure of the distance between the position of the deployed particles and the corresponding position of the detection target in the initial frame image, and the measure of the number of deployed particles can be set as required.

其中，设在起始帧图像中检测目标的位置为位置I，则第二帧图像的平面上与起始帧图像中检测目标对应的位置，即为：在该第二帧图像平面上且与位置I对应相同的位置。Wherein, if the position of the detection target in the initial frame image is position I, then the position corresponding to the detection target in the initial frame image on the plane of the second frame image is: on the second frame image plane and with Position I corresponds to the same position.

从该周期内的第三帧图像开始，根据每帧的前一帧图像中检测目标的位置，在每帧图像平面上部署粒子，具体地，每帧的前一帧图像平面上各粒子对应的相似度都经过归一化，从前一帧图像平面各粒子中选出对应的相似度最大的预设数量的粒子作为已选粒子，在该帧图像平面上部署粒子时，在该帧图像平面上与已选粒子对应的位置附近部署较多粒子，并且，在该帧图像平面上远离与已选粒子对应的位置处部署较少粒子。其中，设前一帧图像平面的已选粒子的位置为位置II，该帧图像平面上与已选粒子对应的位置为：在该帧图像平面上与位置II对应相同的位置。Starting from the third frame image in this period, according to the position of the detection target in the previous frame image of each frame, the particles are deployed on the image plane of each frame, specifically, the corresponding The similarity has been normalized, and the corresponding preset number of particles with the highest similarity is selected from the particles in the image plane of the previous frame as the selected particles. When the particles are deployed on the image plane of this frame, the More particles are deployed near the positions corresponding to the selected particles, and fewer particles are deployed far away from the positions corresponding to the selected particles on the image plane of the frame. Wherein, assuming that the position of the selected particle in the image plane of the previous frame is position II, the position corresponding to the selected particle on the image plane of this frame is: the same position corresponding to position II on the image plane of this frame.

其中，某一帧图像平面上某一粒子对应的相似度，为该帧图像平面上该粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度。Wherein, the similarity corresponding to a certain particle on the image plane of a certain frame is the similarity between the feature histogram of the position of the particle on the image plane of the frame and the feature histogram of the detection target in the previous frame.

根据每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，确定每帧图像中的检测目标，具体地，将每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，进行归一化之后，根据对应的相似度最大的预设数量的粒子(即上述的已选粒子)所在位置的像素点，确定每帧图像中的检测目标，确定出的每一检测目标对应一边框，该边框与采集的图像的边界平行，边框也可称为检测框或矩形框。上述对应的相似度最大的预设数量的粒子所在位置的像素点，即构成检测目标对应的边框的大小范围。对应的相似度最大的例子所在位置的像素点也是后验概率最大的像素点，该后验概率指在已知上一帧某一像素点为检测目标像素的条件下，该帧图像上对应的像素点是检测目标像素的概率。According to the similarity between the feature histogram of the position of each particle in each frame and the feature histogram of the detection target in the previous frame, determine the detection target in each frame of image, specifically, the feature histogram of the position of each particle in each frame After the similarity between the graph and the feature histogram of the detected target in the previous frame is normalized, determine each Each determined detection target in the frame image corresponds to a frame, and the frame is parallel to the boundary of the collected image. The frame can also be called a detection frame or a rectangular frame. The corresponding pixel points at the positions of the preset number of particles with the highest similarity degree constitute the size range of the bounding box corresponding to the detection target. The corresponding pixel at the location of the example with the largest similarity is also the pixel with the largest posterior probability. Pixel is the probability of detecting the target pixel.

步骤S103：在每确定一帧图像中的检测目标之后，输出该确定的检测目标。Step S103: After each determined detection target in a frame of image, output the determined detection target.

需要说明的是，确定的各帧图像的检测目标是即时输出的，以具体的应用场景，例如无人驾驶场景为例，在确定每帧图像中的检测目标之后，即输出该帧图像中的检测目标，例如，步骤S101中通过深度学习网络确定某一周期内起始帧图像中的检测目标之后，即输出该起始帧图像中的检测目标，在步骤S102中，每当确定一帧图像中的检测目标之后，均将确定的检测目标输出。It should be noted that the determined detection target of each frame of image is output immediately. Taking a specific application scenario, such as an unmanned driving scene as an example, after determining the detection target in each frame of image, the output of the frame of image Detection target, for example, in step S101, after determining the detection target in the initial frame image in a certain period through the deep learning network, promptly output the detection target in the initial frame image, in step S102, every time a frame image is determined After the detection target in , output the determined detection target.

由于现有的处理的方法是每一帧都进行检测.这样的处理方法虽然在准确率上很高然后却需求非常庞大的计算量。这样的计算量是目前无人驾驶或者机器人领域不可忍受的。本发明实施例利用基于深度学习网络的图像识别(或称图像检测)算法结合动态追踪的方法，特别是采用粒子滤波技术，对现有的采用深度学习网络目标检测算法进行目标检测的方法进行改进，用深度学习网络检测出周期内的��帧结果之后，再用粒子滤波进行跟踪检测，用粒子滤波的结果代替周期内的后续帧的结果，在新的周期��第一帧再用深度学习网络进行检测，以及后续帧采用用粒子滤波跟踪检测，如此重复上述过程，使得进行目标检测时明显减少了计算量。下面以检测目标为车辆和行人为例，详细介绍发明实施例的目标检测过程。Since the existing processing method is to detect each frame, although such a processing method has a high accuracy rate, it requires a very large amount of calculation. This amount of calculation is currently unbearable in the field of unmanned driving or robotics. The embodiment of the present invention uses the image recognition (or image detection) algorithm based on the deep learning network combined with the method of dynamic tracking, especially the particle filter technology, to improve the existing method of target detection using the deep learning network target detection algorithm , after using the deep learning network to detect the result of a frame in the period, then use the particle filter to track and detect, and use the result of the particle filter to replace the result of the subsequent frame in the period. In the new period, the first frame uses the deep learning network again The detection is carried out, and the subsequent frames are tracked and detected by particle filter, and the above process is repeated, so that the amount of calculation is significantly reduced when performing target detection. Taking the detection targets as vehicles and pedestrians as an example, the target detection process of the embodiment of the invention will be described in detail below.

首先训练神经网络，具体可以采用通常使用的神经网络训练方法，例如反向传播和梯度下降方法，以获得识别车辆或行人的参数，主要是神经网络各层的权重W和偏置b，二者均为神经网络的参数。神经网络可以采用Faster R-CNN或YOLO深度学习网络。Firstly, train the neural network. Specifically, commonly used neural network training methods can be used, such as backpropagation and gradient descent methods, to obtain parameters for identifying vehicles or pedestrians, mainly the weight W and bias b of each layer of the neural network. are the parameters of the neural network. The neural network can use Faster R-CNN or YOLO deep learning network.

接收连续采集的多帧图像，利用训练好的神经网络对其中的第一帧图像进行检测，以用较高的准确率检测出该帧图像中的车辆和行人。具体地，检测得到的是每一车辆、行人对应的边框。Receive multiple frames of images collected continuously, and use the trained neural network to detect the first frame of images, so as to detect vehicles and pedestrians in the frame of images with high accuracy. Specifically, what is detected is the bounding box corresponding to each vehicle and pedestrian.

从第二帧图像开始，采用动态追踪算法，具体用用粒子滤波的方法来跟踪该第二帧图像以及后续帧图像中的检测目标。具体地，根据上述第一帧图像中各车辆、行人对应的边框内的颜色特征，计算色调直方图，以得到各车辆、行人的特征直方图。在该第二帧图像平面的各位置均匀部署粒子，或者，在该第二帧图像的平面上与第一帧图像各车辆、行人对应的位置附近部署较多的粒子、在远离对应第一帧图像各车辆、行人的位置部署较少的粒子，其中上述部署粒子的位置距离对应第一帧图像各车辆、行人的位置远近的衡量标准，以及部署粒子数量多少的衡量标准，均可以根据需要设定。Starting from the second frame image, a dynamic tracking algorithm is used, specifically, a particle filter method is used to track the detection target in the second frame image and subsequent frame images. Specifically, according to the color features in the frame corresponding to each vehicle and pedestrian in the above-mentioned first frame image, the tone histogram is calculated to obtain the feature histogram of each vehicle and pedestrian. Distribute particles evenly at each position of the second frame image plane, or deploy more particles near the positions corresponding to the vehicles and pedestrians in the first frame image on the plane of the second frame image, and deploy particles farther away from the corresponding first frame image plane. The position of each vehicle and pedestrian in the image deploys fewer particles, and the position distance of the above-mentioned deployed particles corresponds to the measurement standard of the position distance of each vehicle and pedestrian in the first frame image, and the measurement standard of the number of deployed particles can be set as required Certainly.

根据该第二帧图像平面上部署的每一粒子所在的位置的颜色特征，分别计算该第二帧图像平面上各粒子所在位置的特征直方图，根据该第二帧图像平面上各粒子所在位置的特征直方图，以及第一帧图像中各车辆、行人的特征直方图，分别计算第二帧图像各粒子所在位置的特征直方图与第一帧各车辆、行人的特征直方图的相似度，以得到该第二帧图像平面上各粒子对应的相似度，根据该第二帧图像平面上各粒子对应的相似度，确定该第二帧图像中的检测目标。According to the color feature of the position of each particle deployed on the image plane of the second frame, respectively calculate the feature histogram of the position of each particle on the image plane of the second frame, according to the position of each particle on the image plane of the second frame and the feature histograms of the vehicles and pedestrians in the first frame image, respectively calculate the similarity between the feature histograms of the positions of the particles in the second frame image and the feature histograms of the vehicles and pedestrians in the first frame, In order to obtain the similarity corresponding to each particle on the image plane of the second frame, the detection target in the second image frame is determined according to the similarity corresponding to each particle on the image plane of the second frame.

上述各特征直方图具体可以是色调直方图。将第二帧图像的每个粒子对应的相似度归一化之后，选出对应的相似度最大的预设数量的粒子作为已选粒子，这些已选粒子所在位置的像素点即构成检测目标对应的边框。计算第二帧图像中各车辆、行人的特征直方图。在第三帧图像图像平面上与上述已选粒子对应的位置附近部署较多的粒子、在远离对应上述已选粒子对应的位置处部署较少的粒子，同样地，在第三帧图像平面上部署粒子的位置距离与第二帧图像已选粒子对应的位置的远近衡量标准、部署粒子数量多少的衡量标准，均可以根据需要设定。这一过程为重采样的过程。Each of the feature histograms above may specifically be a tone histogram. After normalizing the similarity corresponding to each particle in the second frame image, select the corresponding preset number of particles with the highest similarity as the selected particles, and the pixels at the positions of these selected particles constitute the corresponding detection target. border. Calculate the feature histogram of each vehicle and pedestrian in the second frame image. Deploy more particles near the positions corresponding to the above-mentioned selected particles on the image plane of the third frame of the image, and deploy fewer particles at positions far away from the positions corresponding to the above-mentioned selected particles. Similarly, on the image plane of the third frame The distance between the position of the deployed particle and the position corresponding to the selected particle in the second frame image, and the measurement standard of the number of deployed particles can be set as required. This process is a resampling process.

按照与确定第二帧图像中检测目标类似的方法，根据该第三帧图像平面上各粒子所在位置的特征直方图，以及第二帧图像中各车辆、行人的特征直方图，分别计算第三帧图像各粒子所在位置的特征直方图与第二帧中各车辆、行人的特征直方图的相似度，以得到该第三帧图像平面上各粒子对应的相似度，进而根据各相似度确定该第三帧图像中的检测目标，以连续数帧作为一个周期，例如以20帧为一个周期，在第4至第20帧，与确定上述第三帧图像中的检测目标的过程相同，不再赘述。According to the method similar to determining the detection target in the second frame image, according to the feature histogram of the position of each particle on the third frame image plane, and the feature histogram of each vehicle and pedestrian in the second frame image, respectively calculate the third The similarity between the feature histogram of the position of each particle in the frame image and the feature histogram of each vehicle and pedestrian in the second frame is used to obtain the corresponding similarity of each particle on the image plane of the third frame, and then determine the corresponding similarity according to each similarity. The detection target in the third frame image takes consecutive frames as a cycle, for example, takes 20 frames as a cycle, and in the 4th to 20th frame, it is the same as the process of determining the detection target in the third frame image above, no longer repeat.

每一周期都重复上一周期的上述目标检测和追踪过程，即每一周期的起始帧均利用深度学习网络确定检测目标，一个周期内的其他帧，利用粒子滤波算法进行目标跟踪，按照上例，如此每20帧重复一次利用深度学习网络确定检测目标、以及粒子滤波算法进行目标跟踪的过程。基于深度学习的图像检测方法可以准确地识别出图像中的车辆和行人，而动态追踪方法这可以用较少的计算量跟踪目标，可以采用20～30帧的一个周期，由于通常情况下每秒大约有50帧的图像，通过本发明实施例的基于深度学习的目标检测方法，在半秒的时间内足以检测出新进入摄像头的车辆或行人，能够达到很高的识别准确率。Each cycle repeats the above-mentioned target detection and tracking process of the previous cycle, that is, the starting frame of each cycle uses the deep learning network to determine the detection target, and the other frames in a cycle use the particle filter algorithm for target tracking. For example, the process of using the deep learning network to determine the detection target and the particle filter algorithm for target tracking is repeated every 20 frames. The image detection method based on deep learning can accurately identify vehicles and pedestrians in the image, while the dynamic tracking method can track the target with less calculation, and can use a period of 20 to 30 frames. There are about 50 frames of images. Through the deep learning-based target detection method of the embodiment of the present invention, it is enough to detect a new vehicle or pedestrian entering the camera within half a second, and a high recognition accuracy can be achieved.

本发明实施例的基于深度学习的目标检测方法将基于深度学习的图像检测技术和动态追踪方法紧密结合起来，既达到很高的识别准确率又降低了计算量，并且很容易在硬件上部署。The object detection method based on deep learning in the embodiment of the present invention closely combines the image detection technology based on deep learning and the dynamic tracking method, which not only achieves high recognition accuracy but also reduces the amount of calculation, and is easy to deploy on hardware.

图2是根据本发明实施例的基于深度学习的目标检测装置的主要模块示意图。Fig. 2 is a schematic diagram of main modules of an object detection device based on deep learning according to an embodiment of the present invention.

本发明实施例的基于深度学习的目标检测装置200主要包括：检测模块201、追踪模块202、输出模块203。The object detection device 200 based on deep learning in the embodiment of the present invention mainly includes: a detection module 201 , a tracking module 202 , and an output module 203 .

检测模块201用于利用深度学习网络，��定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像。The detection module 201 is used to use the deep learning network to determine the detection target in the initial frame image in each cycle, wherein, taking every consecutive number of frames in the continuously collected multi-frame images as a cycle, the first frame image in each cycle is The starting frame image.

深度学习网络可以为基于更快速基于图像区域的卷积神经网络或YOLO物体检测深度网络。The deep learning network can be a faster image region-based convolutional neural network or a YOLO object detection deep network.

追踪模块202用于对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定该周期内除起始帧图像之外的各帧图像中的检测目标。The tracking module 202 is used to use the features of the detected target in the previous frame image to track the target according to the dynamic tracking algorithm for each frame image in the period except the initial frame image, so as to determine the frame image in the period except the initial frame image. The detection target in each frame image outside.

动态��踪算法可以为粒子滤波算法。The dynamic tracking algorithm may be a particle filter algorithm.

追踪模块202具体可以用于：计算一周期内的起始帧图像中检测目标的特征直方图；在该周期内的第二帧图像平面上，按照预设规则部署粒子，并计算各粒子所在位置的特征直方图；根据各粒子所在位置的特征直方图与该起始帧图像中检测目标的特征直方图的相似度，确定第二帧图像中的检测目标；从该周期内的第三帧图像开始，计算每帧的前一帧图像中检测目标的特征直方图，并根据每帧的前一帧图像中检测目标的位置，在每帧图像平面上部署粒子，然后计算每帧图像平面上各粒子所在位置的特征直方图，根据每帧的各粒子所在位置的特征直方图与前一帧中检测目标的特征直方图的相似度，确定每帧图像中的检测目标。The tracking module 202 can be specifically used to: calculate the feature histogram of the detected target in the initial frame image within one cycle; deploy particles according to preset rules on the second frame image plane within this cycle, and calculate the position of each particle The feature histogram of the feature histogram; according to the similarity between the feature histogram of the position of each particle and the feature histogram of the detection target in the initial frame image, determine the detection target in the second frame image; from the third frame image in this period At the beginning, calculate the feature histogram of the detected target in the previous frame image of each frame, and according to the position of the detected target in the previous frame image of each frame, deploy particles on the image plane of each frame, and then calculate each frame on the image plane The feature histogram of the particle location, according to the similarity between the feature histogram of each particle location in each frame and the feature histogram of the detection target in the previous frame, determine the detection target in each frame of image.

输出模块203用于在每确定一帧图像中的检测目标之后，输出该确定的检测目标。The output module 203 is configured to output the determined detection target after each determination of the detection target in a frame of image.

另外，在本发明实施例中基于深度学习的目标检测装置的具体实施内容，在上面所述基于深度学习的目标检测方法中已经详细说明了，故在此重复内容不再说明。In addition, the specific implementation content of the object detection device based on deep learning in the embodiment of the present invention has been described in detail in the above-mentioned object detection method based on deep learning, so the repeated content will not be described here.

图3示出了可以应用本发明实施例的基于深度学习的目标检测方法或基于深度学习的目标检测装置的示例性系统架构300。FIG. 3 shows an exemplary system architecture 300 to which the deep learning-based object detection method or the deep learning-based object detection apparatus according to the embodiment of the present invention can be applied.

如图3所示，系统架构300可以包括终端设备301、302、303，网络304和服务器305。网络304用以在终端设备301、302、303和服务器305之间提供通信链路的介质。网络304可以包括各种��接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 3 , the system architecture 300 may include terminal devices 301 , 302 , and 303 , a network 304 and a server 305 . The network 304 is used as a medium for providing communication links between the terminal devices 301 , 302 , 303 and the server 305 . Network 304 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备301、302、303通过网络304与服务器305交互，以接收或发送消息等。终端设备301、302、303上可以安装有各种通讯客户端应用，例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 301 , 302 , 303 to interact with server 305 via network 304 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 301, 302, 303, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备301、302、303可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 301, 302, 303 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.

服务器305可以是提供各种服务的服务器，例如对用户利用终端设备301、302、303所浏览的购物类网站提供支持的后台管理服务器。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理，并将处理结果(例如目标推送信息、产品信息)反馈给终端设备。The server 305 may be a server that provides various services, such as a background management server that provides support for shopping websites browsed by users using the terminal devices 301 , 302 , and 303 . The background management server can analyze and process the received data such as product information query requests, and feed back the processing results (such as target push information, product information) to the terminal device.

需要说明的是，本发明实施例所提供的基于深度学习的目标检测方法可由服务器305或��端设备301、302、303执行，相应地，基于深度学习的目标检测装置可以设置于服务器305或终端设备301、302、303中。It should be noted that the object detection method based on deep learning provided by the embodiment of the present invention can be executed by the server 305 or the terminal equipment 301, 302, 303, and correspondingly, the object detection device based on deep learning can be set on the server 305 or the terminal equipment 301, 302, 303 in.

应该理解，图3中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 3 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

下面参考图4，其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统400的结构示意图。图4示出的终端设备或服务器仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring now to FIG. 4 , it shows a schematic structural diagram of a computer system 400 suitable for implementing a terminal device or a server according to an embodiment of the present application. The terminal device or server shown in FIG. 4 is only an example, and should not limit the functions and application scope of this embodiment of the present application.

如图4所示，计算机系统400包括中央处理单元(CPU)401，其可以根据存储在只读存储器(ROM)402中的程序或者从存储部分408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中，还存储有系统400操作所需的各种程序和数据。CPU 401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。As shown in FIG. 4 , a computer system 400 includes a central processing unit (CPU) 401 that can be programmed according to a program stored in a read-only memory (ROM) 402 or loaded from a storage section 408 into a random-access memory (RAM) 403 Instead, various appropriate actions and processes are performed. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401 , ROM 402 , and RAM 403 are connected to each other through a bus 404 . An input/output (I/O) interface 405 is also connected to bus 404 .

以下部件连接至I/O接口405：包括键盘、鼠标等的输入部分406；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分407；包括硬盘等的存储部分408；以及包括诸如LAN卡、调制解调��等的网络接口卡的通信部分409。通信部分409经由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。��介质411，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器410上，以便于从其上读出的计算机程序根据需要被安装入存储部分408。The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, etc.; an output section 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 408 including a hard disk, etc. and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 410 as necessary so that a computer program read therefrom is installed into the storage section 408 as necessary.

特别地，根据本发明公开的实施例，上文参考主要步骤示意图描述的过程可以被实现为计算机软件程序。例如，本发明公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行主要步骤示意图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分409从网络上被下载和安装，和/或从可拆卸介质411被安装。在该计算机程序被中央处理单元(CPU)401执行时，执行本申请的系统中限定的上述功能。In particular, according to the disclosed embodiments of the present invention, the processes described above with reference to the schematic diagrams of main steps can be implemented as computer software programs. For example, the disclosed embodiments of the present invention include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes program codes for executing the method shown in the schematic diagram of main steps. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 409 and/or installed from removable media 411 . When this computer program is executed by a central processing unit (CPU) 401, the above-mentioned functions defined in the system of the present application are performed.

需要说明的是，本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的主要步骤示意图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，主要步骤示意图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的��序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或主要步骤示意图中的每个方框、以及框图或主要步骤示意图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The schematic diagrams and block diagrams of main steps in the drawings illustrate the architecture, functions and operations of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the principal step diagram or block diagram may represent a module, program segment, or portion of code that contains one or more logic components for implementing the specified Executable instructions for a function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of the block diagrams or major step illustrations, and combinations of blocks in the block diagrams or major step illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or Implementation can be by a combination of special purpose hardware and computer instructions.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器包括检测模块201、追踪模块202、输出模块203。其中，这些模块的名称在某种情况下并不构成对该模块本身的限定，例如，检测模块201还可以被描述为“用于利用深度学习网络，确定每一周期内起始帧图像中的检测目标的模块”。The modules involved in the embodiments described in the present invention may be implemented by software or by hardware. The described modules can also be set in a processor, for example, it can be described as: a processor includes a detection module 201 , a tracking module 202 , and an output module 203 . Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances. For example, the detection module 201 can also be described as "used to use a deep learning network to determine the module to detect objects".

作为另一方面，本发明还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的设备中所包含的；也可以是单独存在，而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该设备执行时，使得该设备包括：利用深度学习网络，确定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像；对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定所述周期内除起始帧图像之外的各帧图像中的检测目标；在每确定一帧图像中的检测目标之后，输出该确定的检测目标。As another aspect, the present invention also provides a computer-readable medium. The computer-readable medium may be contained in the device described in the above embodiments, or it may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by one of the devices, the device includes: using a deep learning network to determine the detection target in the initial frame image in each cycle , wherein, taking each consecutive number of frames in the continuously collected multi-frame images as a period, the first frame image in each period is the initial frame image; for each frame image in the period except the initial frame image, use the previous The characteristics of the detection target in the frame image are carried out according to the dynamic tracking algorithm to determine the detection target in each frame image except the initial frame image in the cycle; after each determination of the detection target in a frame image, The determined detection target is output.

根据本发明实施例的技术方案，利用深度学习网络，确定每一周期内起始帧图像中的检测目标，其中，以连续采集的多帧图像中每连续数帧作为一个周期，各周期内第一帧图像为起始帧图像；对于周期内除起始帧图像之外的各帧图像，利用上一帧图像中检测目标的特征，按照动态追踪算法进行目标跟踪，以确定周期内除起始帧图像之外的各帧图像中的检测目标。能够在不降低识别准确率的基础上大程度地降低计算量，从而很好地满足无人驾驶或者机器人领域的需求，且很容易在硬件上部署。According to the technical solution of the embodiment of the present invention, a deep learning network is used to determine the detection target in the initial frame image in each cycle, wherein, taking each consecutive number of frames in the continuously collected multi-frame images as a cycle, the first frame image in each cycle One frame of image is the start frame image; for each frame image in the period except the start frame image, use the characteristics of the detected target in the previous frame image, and perform target tracking according to the dynamic tracking algorithm to determine the period except the start frame image. The detection target in each frame image other than the frame image. It can greatly reduce the amount of calculation without reducing the recognition accuracy, so as to well meet the needs of unmanned driving or robot fields, and is easy to deploy on hardware.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，取决于设计要求和其他因素，可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above specific implementation methods do not constitute a limitation to the protection scope of the present invention. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A target detection method based on deep learning is characterized by comprising the following steps:

determining a detection target in an initial frame image in each period by utilizing a deep learning network, wherein each continuous frame in a plurality of continuously acquired frames of images is taken as a period, and a first frame image in each period is taken as an initial frame image;

for each frame image except the initial frame image in the period, utilizing the characteristics of the detected target in the previous frame image to track the target according to a dynamic tracking algorithm so as to determine the detected target in each frame image except the initial frame image in the period;

after each determination of a detection target in one frame image, the determined detection target is output.

2. The method of claim 1, wherein the dynamic tracking algorithm is a particle filtering algorithm.

3. The method according to claim 2, wherein the step of determining the detection target in each frame image except the initial frame image in the period by performing target tracking according to a dynamic tracking algorithm using the feature of the detection target in the previous frame image for each frame image except the initial frame image in the period comprises:

calculating a characteristic histogram of a detection target in the initial frame image in the period;

deploying particles on a second frame of image plane in the period according to a preset rule, and calculating a characteristic histogram of the position of each particle;

determining the detection target in the second frame of image according to the similarity between the characteristic histogram of the position of each particle and the characteristic histogram of the detection target;

starting from the third frame image in the period, calculating a feature histogram of a detection target in the previous frame image of each frame, deploying particles on each frame image plane according to the position of the detection target in the previous frame image of each frame, then calculating the feature histogram of the position of each particle on each frame image plane, and determining the detection target in each frame image according to the similarity between the feature histogram of the position of each particle of each frame and the feature histogram of the detection target in the previous frame.

4. The method of claim 1, wherein the deep learning network is a faster image region-based convolutional neural network or a YOLO object detection deep network.

5. An object detection device based on deep learning, characterized by comprising:

the detection module is used for determining a detection target in an initial frame image in each period by utilizing a deep learning network, wherein each continuous frame in a plurality of continuously acquired frames of images is taken as one period, and a first frame image in each period is taken as an initial frame image;

the tracking module is used for tracking the target of each frame image except the initial frame image in the period by utilizing the characteristics of the detected target in the previous frame image according to a dynamic tracking algorithm so as to determine the detected target in each frame image except the initial frame image in the period;

and the output module is used for outputting the determined detection target after each detection target in one frame of image is determined.

6. The apparatus of claim 5, wherein the dynamic tracking algorithm is a particle filtering algorithm.

7. The apparatus of claim 6, wherein the tracking module is further configured to:

8. The apparatus of claim 5, wherein the deep learning network is a faster image region-based convolutional neural network or a YOLO object detection deep network.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.