CN111428729A

CN111428729A - Target detection method and device

Info

Publication number: CN111428729A
Application number: CN201910019056.4A
Authority: CN
Inventors: 危磊
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2020-07-17

Abstract

The invention discloses a target detection method and device, and relates to the technical field of computers. One embodiment of the method comprises: collecting a 2D image and a 3D depth image of a detection target; taking the 3D depth image as masking data, and processing a first feature map generated by the 2D image to obtain a second feature map; and inputting the second characteristic diagram into a target detection network to obtain the category information and the position information corresponding to the detection target. The embodiment can improve the identification degree of the detection target and each plane thereof, improve the detection accuracy, and can save the detection cost without increasing extra calculation burden.

Description

A target detection method and device

技术领域technical field

本发明涉及计算机技术领域，尤其涉及一种目标检测方法和装置。The present invention relates to the field of computer technology, and in particular, to a target detection method and device.

背景技术Background technique

随着科技的进步和机械自动化的发展，越来越多的仓库开始采用机械手抓取货物进行分拣以节约人力。在目前机械手控制系统比较成熟的情况下，抓取目标的检测和定位一直是一个难点。With the advancement of technology and the development of mechanical automation, more and more warehouses have begun to use robots to grab goods for sorting to save manpower. In the current situation that the control system of the manipulator is relatively mature, the detection and positioning of the grasped target has always been a difficult point.

现有的目标检测方案包括：Existing object detection schemes include:

SIFT(尺度不变特征变换，Scale-invariant feature transform)提特征加模板匹配方案，其需要对检测目标提前建立模板。以仓库货物的检测为例，仓库入库的货物成千上万，对每一种货物建立模板的成本非常高，同时对没有纹理的一些货物难以提取SIFT特征，导致难以识别；SIFT (Scale-invariant feature transform) is a feature plus template matching scheme, which needs to establish a template for the detection target in advance. Taking the detection of warehouse goods as an example, there are thousands of goods in the warehouse, and the cost of establishing a template for each type of goods is very high. At the same time, it is difficult to extract SIFT features for some goods without texture, which makes it difficult to identify;

边缘检测方案，其对光照变化和检测目标(例如货物包装盒)上的纹理干扰非常敏感，对检测目标各平面的辨识度低。The edge detection scheme is very sensitive to changes in illumination and texture interference on the detection target (such as a cargo packaging box), and the recognition degree of each plane of the detection target is low.

在实现本发明过程中，发明人发现现有技术中至少存在如下问题：In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art:

现有方案对检测目标及其各平面的辨识度低，检测准确率不高，并且检测成本较高。The existing solution has low recognition degree of the detection target and its planes, low detection accuracy, and high detection cost.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供一种目标检测方法和装置，能够提高对检测目标及其各平面的辨识度，提高检测准确率，不会增加额外计算负担且可节省检测成本。In view of this, embodiments of the present invention provide a target detection method and device, which can improve the recognition of the detection target and its planes, improve the detection accuracy, do not increase additional computational burden and save detection costs.

为实现上述目的，根据本发明实施例的一个方面，提供了一种目标检测方法。To achieve the above object, according to an aspect of the embodiments of the present invention, a target detection method is provided.

一种目标检测方法，包括：采集检测目标的2D(二维)图像和3D(三维)深度图像；将所述3D深度图像作为遮掩数据，对由所述2D图像生成的第一特征图进行处理，得到第二特征图；将所述第二特征图输入目标检测网络，得到所述检测目标对应的类别信息和位置信息。A target detection method, comprising: collecting a 2D (two-dimensional) image and a 3D (three-dimensional) depth image of a detection target; using the 3D depth image as masking data, and processing a first feature map generated from the 2D image , obtain a second feature map; input the second feature map into a target detection network to obtain category information and location information corresponding to the detection target.

可选地，将所述3D深度图像作为遮掩数据，对由所述2D图像生成的第一特征图进行处理，得到第二特征图的步骤，包括：将所述2D图像进行N级卷积和池化处理，得到N个第一特征图；将所述3D深度图像进行N级池化处理，得到N个池化的3D深度图像，所述池化的3D深度图像与所述第一特征图一一对应，所述N为正整数；对于每个所述第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行预设处理，以得到所述第二特征图。Optionally, the step of using the 3D depth image as masking data to process the first feature map generated by the 2D image to obtain a second feature map includes: subjecting the 2D image to N-level convolution and Pooling processing to obtain N first feature maps; performing N-level pooling processing on the 3D depth images to obtain N pooled 3D depth images, the pooled 3D depth images and the first feature maps One-to-one correspondence, the N is a positive integer; for each of the first feature maps, the corresponding pooled 3D depth image is used as masking data, and preset processing is performed with the features in the first feature map to obtain the second feature map.

可选地，所述预设处理包括点乘、加法和合并三者之中的其中一种。Optionally, the preset processing includes one of dot product, addition and combination.

根据本发明实施例的另一方面，提供了一种目标检测装置。According to another aspect of the embodiments of the present invention, a target detection apparatus is provided.

一种目标检测装置，包括：图像采集模块，用于采集检测目标的2D图像和3D深度图像；特征图处理模块，用于将所述3D深度图像作为遮掩数据，对由所述2D图像生成的第一特征图进行处理，得到第二特征图；目标检测模块，用于将所述第二特征图输入目标检测网络，得到所述检测目标对应的类别信息和位置信息。A target detection device, comprising: an image acquisition module for acquiring a 2D image and a 3D depth image of a detection target; a feature map processing module for using the 3D depth image as masking data, The first feature map is processed to obtain a second feature map; the target detection module is used for inputting the second feature map into a target detection network to obtain category information and position information corresponding to the detection target.

可选地，所述特征图处理模块还用于：将所述2D图像进行N级卷积和池化处理，得到N个第一特征图；将所述3D深度图像进行N级池化处理，得到N个池化的3D深度图像，所述池化的3D深度图像与所述第一特征图一一对应，所述N为正整数；对于每个所述第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行预设处理，以得到所述第二特征图。Optionally, the feature map processing module is further configured to: perform N-level convolution and pooling processing on the 2D image to obtain N first feature maps; perform N-level pooling processing on the 3D depth image, Obtain N pooled 3D depth images, the pooled 3D depth images are in one-to-one correspondence with the first feature map, and N is a positive integer; for each first feature map, the corresponding pooled The transformed 3D depth image is used as masking data, and pre-processed with the features in the first feature map to obtain the second feature map.

根据本发明实施例的又一方面，提供了一种电子设备。According to yet another aspect of the embodiments of the present invention, an electronic device is provided.

一种电子设备，包括：一个或多个处理器；存储器，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现本发明提供的目标检测方法。An electronic device comprising: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more programs A plurality of processors implement the target detection method provided by the present invention.

根据本发明实施例的又一方面，提供了一种计算机可读介质。According to yet another aspect of the embodiments of the present invention, a computer-readable medium is provided.

一种计算机可读介质，其上存储有计算机程序，所述程序被处理器执行时实现本发明提供的目标检测方法。A computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the target detection method provided by the present invention.

上述发明中的一个实施例具有如下优点或有益效果：采集检测目标的2D图像和3D深度图像；将该3D深度图像作为遮掩数据，对由该2D图像生成的第一特征图进行处理，得到第二特征图；将第二特征图输入目标检测网络，得到检测目标对应的类别信息和位置信息。能够提高对检测目标及其各平面的辨识度，提高检测准确率，不会增加额外计算负担且可节省检测成本。One embodiment of the above invention has the following advantages or beneficial effects: collecting a 2D image and a 3D depth image of the detection target; using the 3D depth image as masking data, and processing the first feature map generated from the 2D image to obtain the first feature map. Two feature maps; input the second feature map into the target detection network to obtain category information and location information corresponding to the detected target. The identification of the detection target and its respective planes can be improved, the detection accuracy can be improved, the additional calculation burden is not increased, and the detection cost can be saved.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明，不构成对本发明的不当限定。其中：The accompanying drawings are used for better understanding of the present invention and do not constitute an improper limitation of the present invention. in:

图1是根据本发明第一实施例的目标检测方法的主要步骤示意图；1 is a schematic diagram of main steps of a target detection method according to a first embodiment of the present invention;

图2是根据本发明实施例的残差块的结构示意图；2 is a schematic structural diagram of a residual block according to an embodiment of the present invention;

图3是根据本发明第二实施例的目标检测模型的结构示意图；3 is a schematic structural diagram of a target detection model according to a second embodiment of the present invention;

图4是根据本发明第三实施例的目标检测模型的结构示意图；4 is a schematic structural diagram of a target detection model according to a third embodiment of the present invention;

图5是根据本发明第四实施例的商品检测流程示意图；5 is a schematic diagram of a product detection process according to a fourth embodiment of the present invention;

图6是根据本发明第五实施例的目标检测装置的主要模块示意图；6 is a schematic diagram of main modules of a target detection device according to a fifth embodiment of the present invention;

图7是本发明实施例可以应用于其中的示例性系统架构图；FIG. 7 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;

图8是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明，其中包括本发明实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本发明的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

本领域技术技术人员知道，本发明的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此，本公开可以具体实现为以下形式，即：完全的硬件、完全的软件(包括固件、驻留软件、微代码等)，或者硬件和软件结合的形式。As will be appreciated by those skilled in the art, embodiments of the present invention may be implemented as a system, apparatus, device, method or computer program product. Accordingly, the present disclosure may be embodied in entirely hardware, entirely software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

图1是根据本发明一个实施例的目标检测方法的主要步骤示意图。FIG. 1 is a schematic diagram of main steps of a target detection method according to an embodiment of the present invention.

如图1所示，本发明一个实施例的目标检测方法主要包括如下的步骤S101至步骤S103。As shown in FIG. 1 , a target detection method according to an embodiment of the present invention mainly includes the following steps S101 to S103.

步骤S101：采集检测目标的2D图像和3D深度图像。Step S101: Collect a 2D image and a 3D depth image of the detection target.

本发明实施例的检测目标可以为多种应用场景下待检测识别的目标，例如在仓库拣选场景下，检测目标为通过机械臂抓取的商品等。The detection target in the embodiment of the present invention may be a target to be detected and identified in various application scenarios, for example, in a warehouse picking scenario, the detection target is a commodity grasped by a robotic arm.

其中，检测目标的2D图像即检测目标的2D彩色图像(RGB图像)，可以利用2D彩色相机和3D深度相机(该两个相机可以是整合的同一相机)分别采集2D图像和3D深度图像。The 2D image of the detection target is the 2D color image (RGB image) of the detection target, and the 2D color camera and the 3D depth camera (the two cameras may be the same integrated camera) can be used to collect the 2D image and the 3D depth image respectively.

在步骤S101之后，可以对采集的检测目标的2D图像和3D深度图像做对齐处理，并对3D深度图像上的缺值部分做涂黑处理，3D深度图像上的黑色区域并不会影响检测目标区域的识别。After step S101, the collected 2D image and 3D depth image of the detection target can be aligned, and the missing value part on the 3D depth image can be blacked out. The black area on the 3D depth image will not affect the detection target. identification of the area.

步骤S102：将检测目标的3D深度图像作为遮掩数据，对由检测目标的2D图像生成的第一特征图进行处理，得到第二特征图。Step S102: Using the 3D depth image of the detection target as mask data, process the first feature map generated from the 2D image of the detection target to obtain a second feature map.

步骤S102具体可以包括：Step S102 may specifically include:

将检测目标的2D图像通过N级残差块和池化层进行卷积和池化处理，得到N个第一特征图；Convolve and pool the 2D image of the detection target through N-level residual blocks and pooling layers to obtain N first feature maps;

将检测目标的3D深度图像进行N级池化处理，得到N个池化的3D深度图像，该池化的3D深度图像与第一特征图一一对应；Perform N-level pooling processing on the 3D depth image of the detection target to obtain N pooled 3D depth images, and the pooled 3D depth images correspond one-to-one with the first feature map;

对于每个第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行点乘处理，以得到第二特征图。For each first feature map, the corresponding pooled 3D depth image is used as masking data, and a dot product is performed with the features in the first feature map to obtain a second feature map.

其中N为正整数。where N is a positive integer.

其中残差块(resblock)的结构可以如图2所示，如图2所示，本结构包括一个具有64个卷积核且卷积核大小为1*1的卷积层、一个具有64个卷积核且卷积核大小为1*3的卷积层以及一个具有256个卷积核且卷积核大小为1*1的卷积层。relu表示激活函数。The structure of the residual block (resblock) can be shown in Figure 2. As shown in Figure 2, this structure includes a convolution layer with 64 convolution kernels and a convolution kernel size of 1*1, a convolution layer with 64 convolution kernels A convolutional layer with kernel size 1*3 and a convolutional layer with 256 kernels and kernel size 1*1. relu represents the activation function.

作为替代实施方式，上述残差块可以替换为块(Block)结构。As an alternative embodiment, the above-mentioned residual block may be replaced with a block (Block) structure.

作为替代实施方式，上述点乘处理可以替换为加法处理，或者合并(concatenate)处理。As an alternative embodiment, the above-described dot multiplication process may be replaced by an addition process, or a concatenate process.

步骤S103：将第二特征图输入目标检测网络，得到检测目标对应的类别信息和位置信息。Step S103: Input the second feature map into the target detection network to obtain category information and position information corresponding to the detection target.

目标检测网络可以是由卷积层构成的卷积网络，也可以采用现有目标检测网络的检测层或检测模块，例如SSD(单发多盒检测器)的最后一层检测层，或者faster rcnn(更快速的区域卷积神经网络)后面的RPN(区域候选网络)检测模块等。The target detection network can be a convolutional network composed of convolutional layers, or it can use the detection layer or detection module of the existing target detection network, such as the last layer detection layer of SSD (single-shot multi-box detector), or faster rcnn (Faster Regional Convolutional Neural Network) followed by RPN (Regional Candidate Network) detection module, etc.

图3是根据本发明第二实施例的目标检测模型的结构示意图。本发明第二实施例在采集检测目标的2D图像和3D深度图像之后，将采集的2D图像和3D深度图像输入图3所示的目标检测模型。目标检测模型包括：��(Block)、第一池化层、第二池化层、检测层。块(Block)包括多个卷积层，本发明实施例的块(Block)可以采用resblock(残差块)的结构，残差块具体结构可以参见图2的介绍，此处不再赘述。块和第一池化层(第一池化层连接在对应的块的后面)用于对2D图像进行多级卷积和池化处理，多级第二池化层用于对3D深度图像进行池化处理。图3块和池化层的层级仅以四层为例，本发明实施例的目标检测模型可以不限于四级块和池化层。FIG. 3 is a schematic structural diagram of a target detection model according to a second embodiment of the present invention. In the second embodiment of the present invention, after the 2D image and the 3D depth image of the detection target are collected, the collected 2D image and the 3D depth image are input into the target detection model shown in FIG. 3 . The target detection model includes: a block, a first pooling layer, a second pooling layer, and a detection layer. A block (Block) includes multiple convolutional layers. The block (Block) in this embodiment of the present invention may adopt a resblock (residual block) structure. For the specific structure of the residual block, refer to the introduction in FIG. 2 , which will not be repeated here. The block and the first pooling layer (the first pooling layer is connected after the corresponding block) is used to perform multi-level convolution and pooling on 2D images, and the multi-level second pooling layer is used for 3D depth images. pooling. The levels of the block and the pooling layer in FIG. 3 are only four layers as an example, and the target detection model in the embodiment of the present invention may not be limited to the four-level block and the pooling layer.

同一层级的第一池化层和第二池化层可以通过同一个池化层来实现，即该同一个池化层既可以对2D图像、又可以对3D深度图像进行池化处理。The first pooling layer and the second pooling layer at the same level can be implemented by the same pooling layer, that is, the same pooling layer can perform pooling processing on both 2D images and 3D depth images.

图3的四级块和第一池化层输出四个第一特征图(每级输出一个第一特征图)，第二池化层输出四个池化的3D深度图像(每级输出一个池化的3D深度图像)。每级生成的第一特征图与每级生成的池化的3D深度图像相对应。对于每个第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行点乘处理，以得到第二特征图，例如，图3中，使用第二池化层1生成的池化的3D深度图作为遮掩数据，与块+第一池化层1生成的第一特征图的特征进行点乘处理，得到一个第二特征图，其他三级按照同样方法，最后得到四个第二特征图，该四个第二特征图输入检测层进行目标检测处理，以生成检测结果，即检测目标对应的类别信息和位置信息(检测框信息)。The four-level block and the first pooling layer of Figure 3 output four first feature maps (one first feature map per stage), and the second pooling layer outputs four pooled 3D depth images (one pool per stage) 3D depth image of the The first feature maps generated at each stage correspond to the pooled 3D depth images generated at each stage. For each first feature map, take the corresponding pooled 3D depth image as masking data, and perform dot product processing with the features in the first feature map to obtain a second feature map. For example, in Figure 3, use the first feature map. The pooled 3D depth map generated by the second pooling layer 1 is used as masking data, and is subjected to dot product processing with the features of the block + the first feature map generated by the first pooling layer 1 to obtain a second feature map. In the same way, four second feature maps are finally obtained, and the four second feature maps are input to the detection layer for target detection processing to generate detection results, that is, the category information and position information (detection frame information) corresponding to the detection target.

图3中

代表点乘，这个结构里，把3D深度图像当成遮罩(mask，即遮掩数据)，对2D彩色图像提出来的特征进行点乘，从而对不同平面的像素值进行了不同的加权，可以极大的提高检测目标及其各平面的辨识度。并且，本发明实施例对3D深度图像仅做了池化(pool)操作，为了和特征(feature)的尺度保持一致，基本没有额外的计算负担，节省检测成本。In Figure 3

Represents point multiplication. In this structure, the 3D depth image is regarded as a mask (mask, that is, masking data), and the features proposed by the 2D color image are dot multiplied, so that the pixel values of different planes are weighted differently, which can be extremely Greatly improve the recognition of the detection target and its planes. In addition, the embodiment of the present invention only performs a pooling operation on the 3D depth image, in order to keep the scale consistent with the feature (feature), there is basically no additional computational burden and the detection cost is saved.

本发明实施例的检测层可以采用由卷积层构成的卷积网络，也可以采用现有目标检测网络的检测层或检测模块。检测层对第二特征图分别进行卷积，同时提取出类别信息和检测框信息。The detection layer in the embodiment of the present invention may use a convolutional network composed of convolutional layers, or may use a detection layer or detection module of an existing target detection network. The detection layer convolves the second feature map respectively, and extracts the category information and detection frame information at the same time.

图4示出了本发明第三实施例的目标检测模型的结构。本实施例中与第二实施例的目标检测模型的结构相似，其中不同之处在于对于每个第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行加法处理(图中的

表示加法处理)，以得到第二特征图。同样地，本实施例的目标检测模型可以不限于四级块和池化层。其他具体实施细节可参加第二实施例的介绍。FIG. 4 shows the structure of the target detection model of the third embodiment of the present invention. The structure of this embodiment is similar to that of the target detection model of the second embodiment, and the difference is that for each first feature map, the corresponding pooled 3D depth image is used as masking data, which is the same as the first feature map. The features of are additively processed (in the figure

represents the addition process) to obtain the second feature map. Likewise, the target detection model of this embodiment may not be limited to the four-level block and pooling layer. For other specific implementation details, please refer to the introduction of the second embodiment.

本发明实施例的目标检测模型为训练后的模型。在模型训练阶段可以利用2D彩色相机和3D深度相机采集检测目标的真实数据(2D图像和3D深度图像)作为训练数据，并进行训练数据的人工标注，对于检测货物场景，标注盒子(货物包装盒)位置和类别，然后训练模型的各层的参数，最后得到训练后的目标检测模型。模型训练方法可以采用各种目标检测模型的训练方法，例如随机梯度下降和反向传播训练方法等。本发明实施例基于3D深度图像和2D彩色图像来训练目标检测模型，克服了现有的目标检测模型提取信息有限的缺陷，并且本发明实施例的目标检测模型把深度视为一种遮掩数据(mask)，对彩色图像进行修饰，然后进行检测，提高了检测精度。将本发明实施例的目标检测模型应用于部署到仓库拣选环节中，可以大大提高货物的检测率。The target detection model in the embodiment of the present invention is a trained model. In the model training stage, 2D color camera and 3D depth camera can be used to collect the real data (2D image and 3D depth image) of the detection target as training data, and the training data can be manually labeled. ) position and category, then train the parameters of each layer of the model, and finally get the trained target detection model. The model training method can adopt various training methods for target detection models, such as stochastic gradient descent and backpropagation training methods. The embodiment of the present invention trains the target detection model based on the 3D depth image and the 2D color image, which overcomes the defect that the existing target detection model extracts limited information, and the target detection model of the embodiment of the present invention regards the depth as a kind of masking data ( mask), the color image is modified, and then detected, which improves the detection accuracy. Applying the target detection model of the embodiment of the present invention to deployment in the warehouse picking link can greatly improve the detection rate of goods.

本发明第四实施例以商品检测为例，如图5所示，本发明第四实施例的商品检测流程包括步骤S501至步骤S504。The fourth embodiment of the present invention takes commodity detection as an example. As shown in FIG. 5 , the commodity detection process of the fourth embodiment of the present invention includes steps S501 to S504 .

步骤S501：采集商品的2D图像和3D深度图像，通过人工标注得到训练样本。Step S501: Collect 2D images and 3D depth images of the commodity, and obtain training samples through manual annotation.

步骤S502：构建商品检测模型，其中包括残差块、池化层、检测层。Step S502: Build a product detection model, which includes a residual block, a pooling layer, and a detection layer.

步骤S503：利用训练样本训练基于深度图像和彩色图像的上述商品检测模型。Step S503: Use the training samples to train the above-mentioned commodity detection model based on the depth image and the color image.

步骤S504：将待检测商品的2D图像和3D深度图像输入训练好的商品检测模型，以得到待检测商品的类别和检测框信息。Step S504: Input the 2D image and 3D depth image of the commodity to be detected into the trained commodity detection model to obtain the category and detection frame information of the commodity to be detected.

本实施例以商品为例描述了目标检测步骤，各步骤的详细实施方案和商品检测模型的结构可以参见上述其他实施例的介绍。In this embodiment, the target detection step is described by taking a commodity as an example. For the detailed implementation of each step and the structure of the commodity detection model, reference may be made to the introduction of the other embodiments above.

图6是根据本发明第五实施例的目标检测装置的主要模块示意图。FIG. 6 is a schematic diagram of main modules of a target detection apparatus according to a fifth embodiment of the present invention.

本发明第五实施例的目标检测装置600主要包括：图像采集模块601、特征图处理模块602、目标检测模块603。The target detection device 600 according to the fifth embodiment of the present invention mainly includes: an image acquisition module 601 , a feature map processing module 602 , and a target detection module 603 .

图像采集模块601，用于采集检测目标的2D图像和3D深度图像。The image acquisition module 601 is used to acquire the 2D image and the 3D depth image of the detection target.

特征图处理模块602，用于将3D深度图像作为遮掩数据，对由2D图像生成的第一特征图进行处理，得到第二特征图。The feature map processing module 602 is configured to process the first feature map generated from the 2D image by using the 3D depth image as mask data to obtain a second feature map.

特征图处理模块602具体用于：The feature map processing module 602 is specifically used for:

将检测目标的2D图像通过N级的块和池化层进行卷积和池化处理，得到N个第一特征图；Convolve and pool the 2D image of the detection target through N-level blocks and pooling layers to obtain N first feature maps;

将检测目标的3D深度图像进行N级池化处理，得到N个池化的3D深度图像，池化的3D深度图像与第一特征图一一对应，N为正整数；Perform N-level pooling processing on the 3D depth image of the detection target to obtain N pooled 3D depth images, and the pooled 3D depth images are in one-to-one correspondence with the first feature map, and N is a positive integer;

对于每个第一特征图，将对应的池化的3D深度图像作为遮掩数据，与该第一特征图中的特征进行预设处理，以得到第二特征图。For each first feature map, the corresponding pooled 3D depth image is used as masking data, and preset processing is performed with the features in the first feature map to obtain a second feature map.

作为替换的实施方式，该块可以采用残差块的结构。As an alternative embodiment, the block may adopt the structure of a residual block.

预设处理包括点乘、加法和合并三者之中的其中一种。Preset processing includes one of dot product, addition, and merge.

目标检测模块603，用于将第二特征图输入目标检测网络，得到检测目标对应的类别信息和位置信息。The target detection module 603 is configured to input the second feature map into the target detection network to obtain category information and position information corresponding to the detected target.

另外，在本发明实施例中目标检测装��的具体实施内容，在上面所述目标检测方法中已经详细说明了，故在此重复内容不再说明。In addition, the specific implementation content of the target detection device in the embodiment of the present invention has been described in detail in the above-mentioned target detection method, so the repeated content is not described here.

图7示出了可以应用本发明实施例的目标检测方法或目标检测装置的示例性系统架构700。FIG. 7 shows an exemplary system architecture 700 to which a target detection method or a target detection apparatus according to an embodiment of the present invention may be applied.

如图7所示，系统架构700可以包括终端设备701、702、703，网络704和服务器705。网络704用以在终端设备701、702、703和服务器705之间提供通信链路的介质。网络704可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 7 , the system architecture 700 may include terminal devices 701 , 702 , and 703 , a network 704 and a server 705 . The network 704 is the medium used to provide the communication link between the terminal devices 701 , 702 , 703 and the server 705 . Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备701、702、703通过网络704与服务器705交互，以接收或发送消息等。终端设备701、702、703上可以安装有各种通讯客户端应用，例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。The user can use the terminal devices 701, 702, 703 to interact with the server 705 through the network 704 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 701 , 702 and 703 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (only examples).

终端设备701、702、703可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器705可以是提供各种服务的服务器，例如对用户利用终端设备701、702、703所浏览的��物类网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理，并将处理结果(例如产品信息--仅为示例)反馈给终端设备。The server 705 may be a server that provides various services, such as a background management server that provides support for shopping websites browsed by the terminal devices 701 , 702 , and 703 (just an example). The background management server can analyze and process the received product information query request and other data, and feed back the processing result (for example, product information—just an example) to the terminal device.

需要说明的是，本发明实施例所提供的目标检测方法一般由服务器705执行，相应地，目标检测装置一般设置于服务器705中。It should be noted that the target detection method provided by the embodiment of the present invention is generally executed by the server 705 , and accordingly, the target detection apparatus is generally set in the server 705 .

应该理解，图7中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

下面参考图8，其示出了适于用来实现本申请实施例的服务器的计算机系统800的结构示意图。图8示出的服务器仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring next to FIG. 8 , it shows a schematic structural diagram of a computer system 800 suitable for implementing the server of the embodiment of the present application. The server shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图8所示，计算机系统800包括中央处理单元(CPU)801，其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中，还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8, a computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read only memory (ROM) 802 or a program from a storage section 808 Instead, various appropriate actions and processes are performed. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801 , the ROM 802 , and the RAM 803 are connected to each other through a bus 804 . An input/output (I/O) interface 805 is also connected to bus 804 .

以下部件连接至I/O接口805：包括键盘、鼠标等的输入部分806；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807；包括硬盘等的存储部分808；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器810上，以便于从其上读出的计算机程序根据需要被安装入存储部分808。The following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 808 including a hard disk, etc. ; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage section 808 as needed.

特别地，根据本发明公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本发明公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分809从网络上被下载和安装，和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时，执行本申请的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the present invention. For example, embodiments disclosed herein include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 809, and/or installed from the removable medium 811. When the computer program is executed by the central processing unit (CPU) 801, the above-described functions defined in the system of the present application are executed.

需要说明的是，本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器包括图像采集模块、特征图处理模块、目标检测模块。其中，这些模块的名称在某种情况下并不构成对该模块本身的限定，例如，图像采集模块还可以被描述为“用于采集检测目标的2D图像和3D深度图像的模块”。The modules involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules can also be set in the processor, for example, it can be described as: a processor includes an image acquisition module, a feature map processing module, and a target detection module. Among them, the names of these modules do not constitute a limitation of the module itself in some cases, for example, the image acquisition module can also be described as "a module for acquiring 2D images and 3D depth images of detection targets".

作为另一方面，本发明还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的设备中所包含的；也可以是单独存在，而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该设备执行时，使得该设备包括：采集检测目标的2D图像和3D深度图像；将所述3D深度图像作为遮掩数据，对由所述2D图像生成的第一特征图进行处理，得到第二特征图；将所述第二特征图输入目标检测网络，得到所述检测目标对应的类别信息和位置信息。As another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring a 2D image and a 3D depth image of the detection target; using the 3D depth image as a masking data, processing the first feature map generated by the 2D image to obtain a second feature map; inputting the second feature map into a target detection network to obtain category information and position information corresponding to the detection target.

根据本发明实施例的技术方案，采集检测目标的2D图像和3D深度图像；将该3D深度图像作为遮掩数据，对由该2D图像生成的第一特征图进行处理，得到第二特征图；将第二特征图输入目标检测网络，得到检测目标对应的类别信息和位置信息。能够提高对检测目标及其各平面的辨识度，提高检测准确率，不会增加额外计算负担且可节省检测成本。According to the technical scheme of the embodiment of the present invention, a 2D image and a 3D depth image of the detection target are collected; the 3D depth image is used as masking data, and the first feature map generated from the 2D image is processed to obtain a second feature map; The second feature map is input to the target detection network to obtain category information and location information corresponding to the detected target. The identification of the detection target and its respective planes can be improved, the detection accuracy can be improved, the additional calculation burden is not increased, and the detection cost can be saved.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，取决于设计要求和其他因素，可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. A method of object detection, comprising:

collecting a 2D image and a 3D depth image of a detection target;

taking the 3D depth image as masking data, and processing a first feature map generated by the 2D image to obtain a second feature map;

and inputting the second characteristic diagram into a target detection network to obtain the category information and the position information corresponding to the detection target.

2. The method of claim 1, wherein the step of processing the first feature map generated from the 2D image using the 3D depth image as mask data to obtain a second feature map comprises:

performing N-level convolution and pooling on the 2D image to obtain N first feature maps;

performing N-level pooling on the 3D depth image to obtain N pooled 3D depth images, wherein the pooled 3D depth images correspond to the first feature map one by one, and N is a positive integer;

and for each first feature map, using the corresponding pooled 3D depth image as masking data, and performing preset processing on the feature in the first feature map to obtain a second feature map.

3. The method of claim 1, wherein the predetermined processing comprises one of dot multiplication, addition and combination.

4. An object detection device, comprising:

the image acquisition module is used for acquiring a 2D image and a 3D depth image of a detection target;

the feature map processing module is used for processing a first feature map generated by the 2D image by taking the 3D depth image as masking data to obtain a second feature map;

and the target detection module is used for inputting the second characteristic diagram into a target detection network to obtain the category information and the position information corresponding to the detection target.

5. The apparatus of claim 4, wherein the feature map processing module is further configured to:

6. The apparatus of claim 4, wherein the predetermined process comprises one of dot multiplication, addition and combination.

7. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-3.

8. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-3.