CN110632608A

CN110632608A - A target detection method and device based on laser point cloud

Info

Publication number: CN110632608A
Application number: CN201810642417.6A
Authority: CN
Inventors: 张立成
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2019-12-31
Anticipated expiration: 2038-06-21
Also published as: CN110632608B

Abstract

The invention discloses a laser point cloud-based target detection method and device, and relates to the technical field of computers. A specific implementation of the method includes: rasterizing the collected laser point cloud data, and extracting features from each grid to obtain three-dimensional lattice data; performing three-dimensional convolution and three-dimensional downsampling on the three-dimensional lattice data, To obtain a three-dimensional feature map; corresponding to each position of the three-dimensional feature map, generate multiple three-dimensional detection frames with the same height, and select a candidate three-dimensional detection frame from the three-dimensional detection frame; for each candidate three-dimensional detection frame The corresponding three-dimensional feature Figure, perform ROI downsampling on the length and width dimensions to obtain feature maps of the same size corresponding to each three-dimensional detection frame; perform classification and regression processing according to the feature map of the same size corresponding to each three-dimensional detection frame to determine the category and location information. This embodiment can not rely on the calibration between the lidar and the camera, and the accuracy of the detection result is high.

Description

A target detection method and device based on laser point cloud

技术领域technical field

本发明涉及计算机技术领域，尤其涉及一种基于激光点云的目标检测方法和装置。The present invention relates to the field of computer technology, in particular to a laser point cloud-based target detection method and device.

背景技术Background technique

利用目标检测技术，可以在三维空间中的激光点云中确定能够包络检测目标的最小的三维长方体框，以自动驾驶领域的车辆检测为例，一个车辆对应一个三维长方体框。目前目标检测效果较好的是基于图像的目标检测的结果，但是仅依靠图像难以得到准确的位置信息，因此需要对激光雷达与摄像头进行标定，之后再把图像中检测到的目标映射到激光点云上，再根据点云上的位置信息做决策。如果标定得不准确，则映射到激光点云上的目标的位置就不准确，从而影响检测结果的准确性。Using target detection technology, the smallest three-dimensional cuboid frame that can envelop the detection target can be determined in the laser point cloud in three-dimensional space. Taking vehicle detection in the field of automatic driving as an example, a vehicle corresponds to a three-dimensional cuboid frame. At present, the target detection effect is better based on the result of image-based target detection, but it is difficult to obtain accurate position information only by relying on images. Therefore, it is necessary to calibrate the lidar and camera, and then map the detected targets in the image to laser points. On the cloud, and then make decisions based on the location information on the point cloud. If the calibration is not accurate, the position of the target mapped to the laser point cloud will be inaccurate, thus affecting the accuracy of the detection results.

在实现本发明过程中，发明人发现现有技术中至少存在如下问题：In the course of realizing the present invention, the inventor finds that there are at least the following problems in the prior art:

现有方法依赖于对激光雷达与摄像头的标定，检测结果的准确性差。Existing methods rely on the calibration of lidar and camera, and the accuracy of detection results is poor.

发明内容Contents of the invention

有鉴于此，本发明实施例提供一种基于激光点云的目标检测方法和装置，能够不依赖于激光雷达与摄像头之间的标定，检测结果的准确性高。In view of this, the embodiments of the present invention provide a laser point cloud-based target detection method and device, which can be independent of the calibration between the laser radar and the camera, and the accuracy of the detection result is high.

为实现上述目的，根据本发明实施例的一个方面，提供了一种基于激光点云的目标检测方法和装置。To achieve the above object, according to an aspect of the embodiments of the present invention, a laser point cloud based object detection method and device are provided.

一种基于激光点云的目标检测方法，包括：将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据；对所述三维点阵数据进行三维卷积和三维降采样，以得到三维特征图；对应所述三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从所述三维检测框中选出候选三维检测框；对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI(感兴趣的区域)降采样，以得到对应各三维检测框的相同大小特征图；根据所述对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。A target detection method based on laser point cloud, comprising: rasterizing the collected laser point cloud data, and extracting features from each grid to obtain three-dimensional lattice data; performing three-dimensional convolution on the three-dimensional lattice data Product and three-dimensional downsampling to obtain a three-dimensional feature map; corresponding to each position of the three-dimensional feature map, generate a plurality of three-dimensional detection frames with the same height, and select a candidate three-dimensional detection frame from the three-dimensional detection frame; For the three-dimensional feature map corresponding to each candidate three-dimensional detection frame, perform ROI (region of interest) downsampling on the length and width dimensions to obtain the same size feature map corresponding to each three-dimensional detection frame; according to the corresponding three-dimensional detection frame Classification and regression processing are performed on the feature maps of the same size to determine the category and location information of the detection target.

可选地，对应所述三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从所述三维检测框中选出候选三维检测框的步骤，包括：对应所述三维特征图的每一位置生成具有相同高度的多个三维检测框，并确定各三维检测框属于前景的概率；利用非极大值抑制算法对各三维检测框去重，从去重后的各三维检测框中选出属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框。Optionally, corresponding to each position of the 3D feature map, generating a plurality of 3D detection frames with the same height, and selecting a candidate 3D detection frame from the 3D detection frames includes: corresponding to the 3D feature Multiple 3D detection frames with the same height are generated for each position in the graph, and the probability that each 3D detection frame belongs to the foreground is determined; the non-maximum value suppression algorithm is used to deduplicate each 3D detection frame, and each 3D detection frame after deduplication Select a preset number of 3D detection frames with the highest probability belonging to the foreground as candidate 3D detection frames.

可选地，通过如下方法确定三维检测框属于前景的概率：将所述三维检测框映射到二维平面上，得到与所述三维检测框对应的第一二维检测框；将预设长方体映射到所述二维平面上，得到与所述预设长方体对应的矩形框，所述预设长方体为经预先标注的检测目标样本；根据所述第一二维检测框与所述矩形框的交并比，确定所述三维检测框属于前景的概率。Optionally, the probability that the three-dimensional detection frame belongs to the foreground is determined by the following method: mapping the three-dimensional detection frame onto a two-dimensional plane to obtain a first two-dimensional detection frame corresponding to the three-dimensional detection frame; mapping a preset cuboid On the two-dimensional plane, a rectangular frame corresponding to the preset cuboid is obtained, and the preset cuboid is a pre-marked detection target sample; according to the intersection of the first two-dimensional detection frame and the rectangular frame And compare, determine the probability that the 3D detection frame belongs to the foreground.

可选地，对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图的步骤，包括：对每一候选三维检测框对应的三维特征图，基于长度和宽度维度，得到四个第二二维检测框对应的特征图；对各第二二维检测框对应的特征图进行ROI降采样，得到各第二二维检测框对应的相同大小特征图；将各第二二维检测框对应的相同大小特征图，按照对应的候选三维检测框，组合为对应各三维检测框的相同大小特征图。Optionally, for the 3D feature map corresponding to each candidate 3D detection frame, ROI downsampling is performed on the length and width dimensions to obtain the same size feature map corresponding to each 3D detection frame, including: for each candidate 3D The three-dimensional feature map corresponding to the detection frame is based on the length and width dimensions, and the feature map corresponding to the four second two-dimensional detection frames is obtained; the ROI downsampling is performed on the feature map corresponding to each second two-dimensional detection frame, and each second two-dimensional detection frame is obtained. The feature maps of the same size corresponding to the three-dimensional detection frame; the feature maps of the same size corresponding to each second two-dimensional detection frame are combined into feature maps of the same size corresponding to each three-dimensional detection frame according to the corresponding candidate three-dimensional detection frame.

根据本发明实施例的另一方面，提供了一种基于激光点云的目标检测装置。According to another aspect of the embodiments of the present invention, an object detection device based on a laser point cloud is provided.

一种基于激光点云的目标检测装置，包括：点云数据处理模块，用于将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据；特征图生成模块，用于对所述三维点阵数据进行三维卷积和三维降采样，以得到三维特征图；候选框生成模块，用于对应所述三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从所述三维检测框中选出候选三维检测框；ROI降采样模块，用于对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图；检测模块，用于根据所述对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。A target detection device based on laser point cloud, including: a point cloud data processing module, used to rasterize the collected laser point cloud data, and extract features from each grid to obtain three-dimensional lattice data; feature map The generating module is used to perform three-dimensional convolution and three-dimensional down-sampling on the three-dimensional lattice data to obtain a three-dimensional feature map; the candidate frame generating module is used to generate corresponding to each position of the three-dimensional feature map with the same height A plurality of three-dimensional detection frames, and selecting a candidate three-dimensional detection frame from the three-dimensional detection frame; ROI down-sampling module, used to perform ROI reduction on the length and width dimensions of the three-dimensional feature map corresponding to each candidate three-dimensional detection frame Sampling to obtain feature maps of the same size corresponding to each three-dimensional detection frame; the detection module is used to perform classification and regression processing according to the feature map of the same size corresponding to each three-dimensional detection frame to determine the category and position information of the detection target.

可选地，所述候选框生成模块还用于：对应所述三维特征图的每一位置生成具有相同高度的多个三维检测框，并确定各三维检测框属于前景的概率；利用非极大值抑制算法对各三维检测框去重，从去重后的各三维检测框中选出属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框。Optionally, the candidate frame generating module is further configured to: generate a plurality of three-dimensional detection frames with the same height corresponding to each position of the three-dimensional feature map, and determine the probability that each three-dimensional detection frame belongs to the foreground; The value suppression algorithm deduplicates each 3D detection frame, and selects a preset number of 3D detection frames with the highest probability of belonging to the foreground from the deduplicated 3D detection frames as candidate 3D detection frames.

可选地，所述候选框生成模块包括前景��定��模块，用于：��所述三维检测框映射到二维平面上，得到与所述三维检测框对应的第一二维检测框；将预设长方体映射到所述二维平面上，得到与所述预设长方体对应的矩形框，所述预设长方体为经预先标注的��测��标样本；根据所述第一二维检测框��所述矩形框的交并比，确定所述三维检测框属于前景的概率。Optionally, the candidate frame generation module includes a foreground determination submodule, configured to: map the 3D detection frame onto a 2D plane to obtain a first 2D detection frame corresponding to the 3D detection frame; Assuming that a cuboid is mapped onto the two-dimensional plane, a rectangular frame corresponding to the preset cuboid is obtained, and the preset cuboid is a pre-marked detection target sample; according to the first two-dimensional detection frame and the rectangle The intersection and union ratio of the frame determines the probability that the 3D detection frame belongs to the foreground.

可选地，所述检测模块还用于：对每一候选三维检测框对应的三维特征图，基于长度和宽度维度，得到四个第二二维检测框对应的特征图；对各第二二维检测框对应的特征图进行ROI降采样，得到各第二二维检测框对应的相同大小特征图；将各第二二维检测框对应的相同大小特征图，按照对应的候选三维检测框，组合为对应各三维检测框的相同大小特征图。Optionally, the detection module is further configured to: for the three-dimensional feature map corresponding to each candidate three-dimensional detection frame, obtain the feature map corresponding to four second two-dimensional detection frames based on the length and width dimensions; The feature map corresponding to the two-dimensional detection frame is subjected to ROI down-sampling to obtain the same size feature map corresponding to each second two-dimensional detection frame; the same size feature map corresponding to each second two-dimensional detection frame, according to the corresponding candidate three-dimensional detection frame, Combined into feature maps of the same size corresponding to each 3D detection frame.

可选地，所述基于激光点云的目标检测装置还包括训练模块，用于：通过OHEM训练方法训练所述特征图生成模块、所述候选框生成模块、所述ROI降采样模块和所述检测模块。Optionally, the target detection device based on laser point cloud also includes a training module, configured to: train the feature map generation module, the candidate frame generation module, the ROI downsampling module and the detection module.

根据本发明实施例的又一方面，提供了一种电子设备。According to yet another aspect of the embodiments of the present invention, an electronic device is provided.

一种电子设备，包括：一个或多个处理器；存储器，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现本发明提供的基于激光点云的目标检测方法。An electronic device, comprising: one or more processors; a memory for storing one or more programs, when the one or more programs are executed by the one or more processors, the one or more A plurality of processors realize the object detection method based on the laser point cloud provided by the present invention.

根据本发明实施例的又一方面，提供了一种计算机可读介质。According to yet another aspect of the embodiments of the present invention, a computer-readable medium is provided.

一种计算机可读介质，其上存储有计算机程序，所述程序被处理器执行时实现本发明提供的基于激光点云的目标检测方法。A computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the object detection method based on the laser point cloud provided by the present invention is implemented.

上述发明中的一个实施例具有如下优点或有益效果：将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据；对三维点阵数据进行三维卷积和三维降采样，以得到三维特征图；对应三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从三维检测框中选出候选三维检测框；对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图；根据对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。本发明通过直接对采集的三维的激光点云数据进行处理，能够不依赖于激光雷达与摄像头之间的标定，检测结果的准确性高。An embodiment of the above invention has the following advantages or beneficial effects: rasterize the collected laser point cloud data, and extract features from each grid to obtain three-dimensional lattice data; perform three-dimensional convolution on the three-dimensional lattice data and three-dimensional downsampling to obtain a three-dimensional feature map; corresponding to each position of the three-dimensional feature map, multiple three-dimensional detection frames with the same height are generated, and candidate three-dimensional detection frames are selected from the three-dimensional detection frames; for each candidate three-dimensional detection frame The three-dimensional feature map corresponding to the frame, ROI downsampling is performed on the length and width dimensions to obtain the same size feature map corresponding to each three-dimensional detection frame; classification and regression processing are performed according to the same size feature map corresponding to each three-dimensional detection frame to determine The category and location information of the detected target. By directly processing the collected three-dimensional laser point cloud data, the present invention can not rely on the calibration between the laser radar and the camera, and the accuracy of the detection result is high.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明，不构成对本发明的不当限定。其中：The accompanying drawings are used to better understand the present invention, and do not constitute improper limitations to the present invention. in:

图1是根据本发明实施例的基于激光点云的目标检测方法的主要步骤示意图；Fig. 1 is a schematic diagram of the main steps of a target detection method based on a laser point cloud according to an embodiment of the present invention;

图2是根据本发明一个实施例的目标检测模型的构成示意图；2 is a schematic diagram of the composition of a target detection model according to an embodiment of the present invention;

图3是根据本发明实施例的基于激光点云的目标检测装置的主要模块示意图；3 is a schematic diagram of main modules of a target detection device based on a laser point cloud according to an embodiment of the present invention;

图4是本发明实施例可以应用于其中的示例性系统架构图；FIG. 4 is an exemplary system architecture diagram to which an embodiment of the present invention can be applied;

图5是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。Fig. 5 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明，其中包括本发明实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本发明的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

图1是根据本发明实施例的基于激光点云的目标检测方法的主要步骤示意图。Fig. 1 is a schematic diagram of main steps of a target detection method based on a laser point cloud according to an embodiment of the present invention.

如图1所示，本发明实施例的基于激光点云的目标检测方法主要包括如下的步骤S101至步骤S105。As shown in FIG. 1 , the object detection method based on the laser point cloud in the embodiment of the present invention mainly includes the following steps S101 to S105.

步骤S101：将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据。Step S101: rasterizing the collected laser point cloud data, and extracting features from each raster to obtain 3D point matrix data.

将激光点云数据栅格化可以是对预设范围内的激光点云数据进行栅格化，该预设范围与目标检测的任务相关，其数值可以根据经验设定。例如无人驾驶中的车辆检测任务中，可以选取前向40米、左右各20米、高度0到10米的范围内的激光点云数据。Rasterizing the laser point cloud data may be to rasterize the laser point cloud data within a preset range, the preset range is related to the task of target detection, and its value can be set according to experience. For example, in the vehicle detection task in unmanned driving, laser point cloud data within the range of 40 meters forward, 20 meters left and right, and a height of 0 to 10 meters can be selected.

三维点阵数据包括每一栅格对应的四个特征值。四个特征值分别是栅格中点云的最大高度、具有最大高度的点云的反射强度、栅格中点云的数量、以及指示栅格中是否有点的指示值(1表示栅格中有点，0表示栅格中无点)。通过该四个特征值可以很好地表示当前栅格是否为检测目标的一部分。The three-dimensional lattice data includes four eigenvalues corresponding to each grid. The four eigenvalues are the maximum height of the point cloud in the raster, the reflection intensity of the point cloud with the maximum height, the number of point clouds in the raster, and an indicator value indicating whether there is a point in the raster (1 means a point in the raster , 0 means no points in the grid). Whether the current grid is a part of the detection target can be well represented by the four eigenvalues.

步骤S102：对三维点阵数据进行三维卷积和三维降采样，以得到三维特征图。Step S102: Perform 3D convolution and 3D downsampling on the 3D lattice data to obtain a 3D feature map.

步骤S103：对应三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从三维检测框中选出候选三维检测框。Step S103: Corresponding to each position of the 3D feature map, generate multiple 3D detection frames with the same height, and select candidate 3D detection frames from the 3D detection frames.

步骤S103具体可以包括：对应三维特征图的每一位置生成多个三维检测框，以确定各三维检测框属于前景的概率、位置信息和尺度信息，各三维检测框的尺度信息包括相同高度；利用非极大值抑制算法对各三维检测框去重，从去重后的各三维检测框中选出属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框。Step S103 may specifically include: generating a plurality of three-dimensional detection frames corresponding to each position of the three-dimensional feature map, so as to determine the probability, position information and scale information of each three-dimensional detection frame belonging to the foreground, and the scale information of each three-dimensional detection frame includes the same height; The non-maximum value suppression algorithm deduplicates each 3D detection frame, and selects a preset number of 3D detection frames with the highest probability belonging to the foreground from each 3D detection frame after deduplication, as candidate 3D detection frames.

其中，通过如下方法确定三维检测框属于前景的概率：将三维检测框映射到二维平面上，得到与该三维检测框对应的第一二维检测框；将预设长方体映射到二维平面上，得到与预设长方体对应的矩形框，该预设长方体��经预先标注的检测目标样本；根据该第一二维检测框与该矩形框的交并比，确定该三维检测框属于前景的概率。Wherein, the probability that the 3D detection frame belongs to the foreground is determined by the following method: mapping the 3D detection frame onto a 2D plane to obtain a first 2D detection frame corresponding to the 3D detection frame; mapping a preset cuboid onto a 2D plane , to obtain a rectangular frame corresponding to a preset cuboid, which is a pre-marked detection target sample; according to the intersection ratio between the first two-dimensional detection frame and the rectangular frame, determine the probability that the three-dimensional detection frame belongs to the foreground .

步骤S104：对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图。Step S104: For the 3D feature map corresponding to each candidate 3D detection frame, ROI downsampling is performed in the length and width dimensions to obtain feature maps of the same size corresponding to each 3D detection frame.

步骤S104具体可以包括：对每一候选三维检测框对应的三维特征图，基于长度和宽度维度，得到四个第二二维检测框对应的特征图；对各第二二维检测框对应的特征图进行ROI降采样，得到各第二二维检测框对应的相同大小特征图；将各第二二维检测框对应的相同大小特征图，按照对应的候选三维检测框，组合为对应各三维检测框的相同大小特征图。Step S104 may specifically include: for the three-dimensional feature map corresponding to each candidate three-dimensional detection frame, based on the length and width dimensions, obtain the feature map corresponding to four second two-dimensional detection frames; for the feature map corresponding to each second two-dimensional detection frame The image is subjected to ROI downsampling to obtain feature maps of the same size corresponding to each second two-dimensional detection frame; the feature maps of the same size corresponding to each second two-dimensional detection frame are combined into corresponding three-dimensional detection frames according to the corresponding candidate three-dimensional detection frame The same size feature map of the box.

通过在长度和宽度维度上进行ROI降采样，使得ROI降采样可以用来处理三维特征图，进而使得可以便捷地直接对采集的三维的激光点云数据进行处理，而不依赖于激光雷达与摄像头之间的标定。By performing ROI downsampling in the length and width dimensions, ROI downsampling can be used to process 3D feature maps, which in turn makes it possible to directly process the collected 3D laser point cloud data without relying on lidar and cameras. Calibration between.

步骤S105：根据对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。Step S105: Perform classification and regression processing according to the feature maps of the same size corresponding to each three-dimensional detection frame to determine the category and location information of the detection target.

具体地，对对应各三维检测框的相同大小特征图进行连续的两次三维卷积处理，然后对经过该三维卷积处理的对应各三维检测框的相同大小特征图，进行分类和回归处理，以确定检测目标的类别和位置信息。Specifically, two consecutive three-dimensional convolution processes are performed on the feature maps of the same size corresponding to each three-dimensional detection frame, and then classification and regression processing are performed on the feature map of the same size corresponding to each three-dimensional detection frame after the three-dimensional convolution process, To determine the category and location information of the detected target.

下面以自动驾驶领域的车辆检测为例，详细介绍本发明实施例的基于激光点云的目标检测方法。本发明实施例的基于激光点云的目标检测方法不仅限于检测车辆，还可以用于检测行人等其他目标。Taking the vehicle detection in the field of automatic driving as an example, the object detection method based on the laser point cloud according to the embodiment of the present invention will be introduced in detail below. The object detection method based on the laser point cloud in the embodiment of the present invention is not limited to detecting vehicles, but can also be used to detect other objects such as pedestrians.

本发明实施例直接对采集的三维的激光点云数据进行处理，检测其中的车辆，而不依赖于激光雷达与摄像头之间的标定，使车辆检测的结果更加准确。The embodiments of the present invention directly process the collected three-dimensional laser point cloud data to detect the vehicles therein without relying on the calibration between the laser radar and the camera, so that the vehicle detection results are more accurate.

首先，把采集的三维空间中的激光点云数据栅格化，具体地，选取前向40米、左右各20米、高度0到10米的范围内的激光点云数据，并且对于选取出的上述范围的激光点云数据栅格化，其中前向和左右每0.1米对应一个栅格，高度每0.4米对应一个栅格，从而，激光点云数据被分成了400*400*25个三维的栅格，选取出的上述范围的激光点云数据都落入这些栅格中的其中一个，对每个栅格提取4个特征，分别是栅格中点云的最大高度、具有最大高度的点云的反射强度、栅格中点云的数量、以及指示栅格中是否有点的指示值，该指示值可以为0或1，其中，栅格中有点则指示值为1、无点则指示值0，其中，点即为组成激光点云的点。First, rasterize the laser point cloud data collected in the three-dimensional space, specifically, select the laser point cloud data within the range of 40 meters forward, 20 meters left and right, and a height of 0 to 10 meters, and for the selected The laser point cloud data in the above range is rasterized, in which every 0.1 meters in the forward direction and left and right corresponds to a grid, and every 0.4 meters in height corresponds to a grid. Therefore, the laser point cloud data is divided into 400*400*25 three-dimensional grid, the selected laser point cloud data in the above range falls into one of these grids, and extracts 4 features for each grid, which are the maximum height of the point cloud in the grid and the point with the maximum height The reflection intensity of the cloud, the number of point clouds in the grid, and the indication value indicating whether there are points in the grid. The indication value can be 0 or 1. Among them, if there are points in the grid, the indication value is 1, and if there are no points, the indication value is 0, where the points are the points that make up the laser point cloud.

对激光点云数据栅格化，并对每个栅格提取上述4个特征，从而得到400*400*25的三维点阵数据，三维点阵数据中每个栅格的位置有4个通道的值，即上述四个特征的值。通过提取该4个特征，可以很好地代表当前栅格是否是车辆的一部分。需要说明的是，如果检测目标为行人或其他物体，上述的进行激光点云数据栅格化时所选取的激光点云数据的范围需要根据检测目标的大小进行适应性调整。Rasterize the laser point cloud data, and extract the above four features for each grid, so as to obtain 3D lattice data of 400*400*25, and the position of each grid in the 3D lattice data has 4 channels value, which is the value of the above four features. By extracting these 4 features, it can well represent whether the current grid is a part of the vehicle. It should be noted that if the detection target is a pedestrian or other object, the range of the laser point cloud data selected when rasterizing the laser point cloud data mentioned above needs to be adaptively adjusted according to the size of the detection target.

本发明实施例可以基于Faster RCNN(更快速基于图像区域的卷积神经网络)框架构建目标检测模型，以执行目标检测。In the embodiment of the present invention, a target detection model can be constructed based on a Faster RCNN (Faster RCNN based on image region) framework to perform target detection.

首先，可以基于VGG16网络等卷积神经网络的网络结构，构建三维卷积神经网络，以对三维点阵数据进行三维卷积和三维降采样，从而得到三维特征图。例如，保持VGG16网络的卷积核的数量、卷积核的大小、步长不变，增加卷积核的维度，使原有的二维卷积核变为三维卷积核，从而二维卷积层变为三维卷积层，可以实现在三维立体上进行卷积。本发明还可以基于其他卷积神经网络(例如GoogleNet MobileNet等)的网络结构来构建上述三维卷积神经网络。First, based on the network structure of convolutional neural networks such as the VGG16 network, a 3D convolutional neural network can be constructed to perform 3D convolution and 3D downsampling on 3D lattice data to obtain a 3D feature map. For example, keep the number of convolution kernels, the size of the convolution kernel, and the step size of the VGG16 network unchanged, increase the dimension of the convolution kernel, and change the original two-dimensional convolution kernel into a three-dimensional convolution kernel, so that the two-dimensional convolution The accumulation layer becomes a three-dimensional convolutional layer, which can realize convolution on three-dimensional volume. The present invention can also construct the above-mentioned three-dimensional convolutional neural network based on the network structure of other convolutional neural networks (such as GoogleNet MobileNet, etc.).

具体的，保留VGG16网络的除最后一个降采样层的所有的卷积层和降采样层，之所以去掉最后一个降采样层，是因为经过该层处理后数据会丢失一些对车辆检测任务有用的信息。根据上文介绍，将VGG网络原有的二维卷积层变为三维卷积层，相应地，二维的参数变成了三维的参数，此外，降采样层改为三维的降采样，但是只在前向和左右两个方向进行降采样，由于高度方向上维度较少(本例中高度只有25)，为了避免降采样失去重要信息，因此在高度上不进行降采样。经过三维卷积与三维降采样之后，得到25*25*25的三维特征图。Specifically, all the convolutional layers and downsampling layers of the VGG16 network except the last downsampling layer are retained. The reason why the last downsampling layer is removed is because the data will lose some useful information for vehicle detection tasks after being processed by this layer. information. According to the above introduction, the original two-dimensional convolutional layer of the VGG network is changed to a three-dimensional convolutional layer. Correspondingly, the two-dimensional parameters become three-dimensional parameters. In addition, the downsampling layer is changed to three-dimensional downsampling, but Downsampling is only performed in the forward and left and right directions. Since there are fewer dimensions in the height direction (in this case, the height is only 25), in order to avoid losing important information through downsampling, no downsampling is performed on the height. After three-dimensional convolution and three-dimensional downsampling, a 25*25*25 three-dimensional feature map is obtained.

然后，在三维卷积神经网络的最后一层(为relu层，即激活层)的后面接一个分类层(也称第一分类层)和回归层(也称第一回归层)，这两个层是全连接层。通过该第一分类层和第一回归层，对应三维特征图的每一位置，生成具有相同高度的多个三维检测框。三维特征图的每一位置都对应一个三维检测框，生成该三维检测框的过程即分别利用第一分类层、第一回归层确定各三维检测框属于前景的概率、位置信息和尺度信息的过程。Then, a classification layer (also called the first classification layer) and a regression layer (also called the first regression layer) are connected behind the last layer of the three-dimensional convolutional neural network (relu layer, ie, the activation layer). layer is a fully connected layer. Through the first classification layer and the first regression layer, multiple three-dimensional detection frames with the same height are generated corresponding to each position of the three-dimensional feature map. Each position of the 3D feature map corresponds to a 3D detection frame. The process of generating the 3D detection frame is the process of using the first classification layer and the first regression layer to determine the probability, position information and scale information of each 3D detection frame belonging to the foreground. .

其中，第一分类层用于判断三维检测框属于前景还是背景，第一回归层用于确定三维检测框的位置信息。需要说明的是，在本发明实施例的目标检测模型的训练阶段，第一回归层学习的是三维检测框的6个值，分别是三维检测框的中心点的坐标(x,y,z)与三维检测框的长度、宽度和高度。Wherein, the first classification layer is used to determine whether the 3D detection frame belongs to the foreground or the background, and the first regression layer is used to determine the position information of the 3D detection frame. It should be noted that, in the training phase of the target detection model in the embodiment of the present invention, the first regression layer learns 6 values of the three-dimensional detection frame, which are the coordinates (x, y, z) of the center point of the three-dimensional detection frame and the length, width and height of the 3D detection box.

三维检测框生成的方式如下，首先把25*25*25的三维特征图的每个位置生成多个三维检测框。具体地，利用第一分类层确定各三维检测框属于前景的概率，通过第一回归层确定各三维检测框的位置信息和尺度信息。为了判断三维检测框属于前景还是背景，把三维检测框映射到二维平面上，得到一个二维检测框(即第一二维检测框)。然后把三维的可能包含车辆的长方体映射到二维平面上，得到一��矩形框，该长方体是经过预先标注的包含车辆的长方体。在判断三维检测框属于前景还是背景时，根据三维检测框映射到二维平面而得到的二维检测框，以及可能包含车辆的长方体映射到二维平面而得到的矩形框，计算二者的IOU(交并比)，如果IOU大于0.7，则属于前景的概率较大，认为是前景，小于0.5则属于背景的概率较大，认为是背景，其余数值忽略。需要说明的是，上述矩形框的边可能与图像的x方向(或y反向)有一个旋转角度，在对第一回归层训练时，把第一回归层需要学习的上述三维检测框的6个值，设定成包含上述长方体的最小的平行于x方向与y方向的长方体的中心点的坐标以及长度、宽度和高度。The method of generating the 3D detection frame is as follows. First, multiple 3D detection frames are generated for each position of the 25*25*25 3D feature map. Specifically, the first classification layer is used to determine the probability that each 3D detection frame belongs to the foreground, and the first regression layer is used to determine the position information and scale information of each 3D detection frame. In order to determine whether the 3D detection frame belongs to the foreground or the background, the 3D detection frame is mapped onto a 2D plane to obtain a 2D detection frame (ie, the first 2D detection frame). Then, the three-dimensional cuboid that may contain the vehicle is mapped to the two-dimensional plane to obtain a rectangular box, which is a pre-marked cuboid that contains the vehicle. When judging whether the 3D detection frame belongs to the foreground or the background, calculate the IOU of the two according to the 2D detection frame obtained by mapping the 3D detection frame to a 2D plane, and the rectangular box obtained by mapping a cuboid that may contain a vehicle to a 2D plane. (Intersection-over-union ratio), if the IOU is greater than 0.7, the probability of belonging to the foreground is greater, and it is considered to be the foreground, and if it is less than 0.5, the probability of belonging to the background is greater, and it is considered to be the background, and the rest of the values are ignored. It should be noted that the side of the above-mentioned rectangular frame may have a rotation angle with the x-direction (or y-reverse) of the image. When training the first regression layer, the first regression layer needs to learn 6 A value, set to include the coordinates of the center point, length, width and height of the smallest cuboid parallel to the x-direction and y-direction of the above cuboid.

对三维卷积神经网络输出的三维特征图上的每个位置，生成4个三维检测框，其中，(长度，宽度，高度)分别等于(39,16,4)、(16,39,4)、(10,6,4)、(6,10,4)��4种，单位都是点的个数，这样尺度的三维检测框是符合车辆大小的。如果检测目标为行人或其他物体，则生成与行人或其他物体的大小相符合的尺度的三维检测框。For each position on the 3D feature map output by the 3D convolutional neural network, generate 4 3D detection boxes, where (length, width, height) are equal to (39,16,4), (16,39,4) respectively , (10,6,4) and (6,10,4) are four kinds in total, and the unit is the number of points, so the three-dimensional detection frame of this scale is in line with the size of the vehicle. If the detection target is a pedestrian or other object, a three-dimensional detection frame with a scale consistent with the size of the pedestrian or other object is generated.

从三维检测框中选出候选三维检测框。具体地，从属于前景的三维检测框中挑选出置信度高的三维检测框，即选择第一分类层输出的属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框。在目标检测模型的训练阶段，选择12000个三维检测框，在测试阶段选择6000个三维检测框，再用非极大值抑制算法，对重叠高(例如重叠率高于某预设的阈值)的三维检测框只保留一个置信度最高的三维检测框，并且，训练阶段可以选取抑制后的前2000的三维检测框作为候选三维检测框，测试阶段可以选取抑制后的前300的三维检测框作为候选三维检测框。Candidate 3D detection boxes are selected from the 3D detection boxes. Specifically, a 3D detection frame with high confidence is selected from the 3D detection frames belonging to the foreground, that is, a preset number of 3D detection frames output by the first classification layer with the highest probability of belonging to the foreground are selected as candidate 3D detection frames. In the training phase of the target detection model, 12,000 3D detection frames are selected, and 6,000 3D detection frames are selected in the test phase, and then the non-maximum value suppression algorithm is used to detect objects with high overlap (for example, the overlap rate is higher than a preset threshold). Only one 3D detection frame with the highest confidence is retained in the 3D detection frame, and the first 2000 3D detection frames after suppression can be selected as candidate 3D detection frames in the training phase, and the top 300 3D detection frames after suppression can be selected as candidates in the test phase 3D detection box.

用一个ROI降采样层，根据VGG16最后一层输出的三维特征图，以及上述得到的候选三维检测框，对不同大小的候选三维检测框进行降采样，以得到对应各三维检测框的相同大小特征图。Use a ROI downsampling layer, according to the 3D feature map output by the last layer of VGG16, and the candidate 3D detection frame obtained above, downsample the candidate 3D detection frames of different sizes to obtain the same size features corresponding to each 3D detection frame picture.

ROI降采样层和传统降采样层的区别在于，ROI降采样层可以把不同尺度的检测框对应的特征图降采样到同样大小的特征图，由于候选三维检测框的高度都4(根据本例上文，高度每0.4米一个栅格，4代表1.6米，如果检测目标为行人或其他物体，则高度为其他数值，具体数值与检测目标的大小相关)，因此，只对长度和宽度这两个维度进行ROI降采样，使所有候选三维检测框具有相同的长度、宽度(高度相同)的特征图，这样各三维检测框的相同大小特征图既有长度、宽度、高度，并对应VGG16网络的512维通道。在ROI降采样层之后再连接两个三维卷积层，卷积核数量为128，卷积核大小为1*1*1，步长为1，该两个三维卷积层用于提取特征。The difference between the ROI downsampling layer and the traditional downsampling layer is that the ROI downsampling layer can downsample the feature maps corresponding to the detection frames of different scales to feature maps of the same size. Since the heights of the candidate 3D detection frames are all 4 (according to this example Above, the height is a grid every 0.4 meters, 4 represents 1.6 meters, if the detection target is a pedestrian or other object, the height is other values, the specific value is related to the size of the detection target), therefore, only the length and width ROI downsampling is performed in each dimension, so that all candidate 3D detection frames have feature maps with the same length and width (same height), so that the feature maps of the same size of each 3D detection frame have length, width, and height, and correspond to the VGG16 network. 512-dimensional channels. After the ROI downsampling layer, two three-dimensional convolutional layers are connected. The number of convolutional kernels is 128, the size of the convolutional kernels is 1*1*1, and the step size is 1. The two three-dimensional convolutional layers are used to extract features.

两个三维卷积层之后，再连接一个分类层(也称第二分类层)和回归层(也称第二回归层)，该第二分类层用于确定候选三维检测框的类别，该第二回归层用于确定候选三维检测框的具体位置信息。候选三维检测框的类别有两种，即车辆或者背景，候选三维检测框的具体位置信息是三维框的8个顶点的坐标，共24个值。After the two three-dimensional convolutional layers, a classification layer (also called the second classification layer) and a regression layer (also called the second regression layer) are connected, and the second classification layer is used to determine the category of the candidate three-dimensional detection frame. The binary regression layer is used to determine the specific position information of the candidate 3D detection frame. There are two types of candidate 3D detection frames, namely vehicle or background. The specific position information of the candidate 3D detection frame is the coordinates of the 8 vertices of the 3D frame, with a total of 24 values.

本发明实施例的目标检测模型可以如图2所示。The target detection model of the embodiment of the present invention may be shown in FIG. 2 .

在训练如图2所示目标检测模型的网络时，可以从2000个候选框中任意选择A个样本(A＝128或256)，其中有一定比例的负样本，即背景，利用选择的这些样本对网络进行梯度下降，网络的学习采用反向传播算法和随机梯度下降方法。具体地，在训练之前标注好训练样本的真值标签，对于第一分类层和第一回归层，每次训练时根据标注好的真值标签(即前景或背景，以及三维检测框的位置信息)，以及第一分类层和第一回归层输出的结果，计算分类代价和回归代价，对于第二分类层和第二回归层，按照标注好的真值标签(即车辆或背景，以及车辆的具体位置)，以及第二分类层和第二回归层的输出结果，计算分类代价和回归代价，不断地缩小总Loss值(总Loss即总代价，包括分类代价与回归代价)，最后得到输出比较准确的分类层(第一分类层和第二分类层)和回归层(第一回归层和第二回归层)的输出值，梯度下降通过使Loss值向当前点对应梯度的反方向不断移动，来降低Loss，随机梯度下降每次只更新一个训练样本所计算的梯度，其中，采用反向传播算法来求梯度。When training the network of the target detection model as shown in Figure 2, A samples (A=128 or 256) can be arbitrarily selected from 2000 candidate boxes, among which there are a certain proportion of negative samples, that is, the background, using these selected samples Gradient descent is performed on the network, and the learning of the network adopts the backpropagation algorithm and the stochastic gradient descent method. Specifically, the true value labels of the training samples are marked before training. For the first classification layer and the first regression layer, according to the marked true value labels (that is, foreground or background, and the position information of the three-dimensional detection frame) during each training ), and the output results of the first classification layer and the first regression layer, calculate the classification cost and regression cost, for the second classification layer and the second regression layer, according to the marked true value label (ie vehicle or background, and vehicle's Specific location), and the output results of the second classification layer and the second regression layer, calculate the classification cost and regression cost, continuously reduce the total Loss value (total Loss is the total cost, including classification cost and regression cost), and finally get the output comparison The output values of the accurate classification layer (the first classification layer and the second classification layer) and the regression layer (the first regression layer and the second regression layer), the gradient descent continuously moves the Loss value to the opposite direction of the gradient corresponding to the current point, To reduce Loss, stochastic gradient descent only updates the gradient calculated by one training sample each time, and the backpropagation algorithm is used to find the gradient.

本发明实施例还可以采用OHEM(Online hard example mining)方法对网络进行训练，与上述训练过程中任意选择A个样本不同的是，本发明实施例采用OHEM方法，将每一候选框作为样本，计算其总Loss值，并对各样本的总Loss值从大到小进行排序，选择总Loss值最大的A个样本(A＝128或256)，利用选择的这些样本对网络进行梯度下降，同样地，网络的学习采用反向传播算法和随机梯度下降方法，此处不再赘述。这样使得一些较难学习的样本也可以得到很好地学习，使得车辆检测的结果更加准确。In the embodiment of the present invention, the OHEM (Online hard example mining) method can also be used to train the network. Unlike the random selection of A samples in the above training process, the embodiment of the present invention uses the OHEM method, using each candidate frame as a sample, Calculate its total Loss value, and sort the total Loss value of each sample from large to small, select A samples (A=128 or 256) with the largest total Loss value, use these selected samples to perform gradient descent on the network, and also Generally, the learning of the network adopts the backpropagation algorithm and the stochastic gradient descent method, which will not be repeated here. In this way, some samples that are difficult to learn can also be well learned, making the result of vehicle detection more accurate.

图3是根据本发明实施例的基于激光点云的目标检测装置的主要模块示意图。Fig. 3 is a schematic diagram of main modules of an object detection device based on a laser point cloud according to an embodiment of the present invention.

本发明实施例的基于激光点云的目标检测装置300主要包括点云数据处理模块301、特征图生成模块302、候选框生成模块303、ROI降采样模块304、检测模块305。The object detection device 300 based on the laser point cloud in the embodiment of the present invention mainly includes a point cloud data processing module 301 , a feature map generation module 302 , a candidate frame generation module 303 , an ROI downsampling module 304 , and a detection module 305 .

点云数据处理模块301用于将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据，三维点阵数据包括每一栅格对应的四个特征值(通过对每一栅格提取四个特征而得到)，每一栅格的四个特征值为：该栅格中点云的最大高度、具有最大高度的点云的反射强度、该栅格中点云的数量、以及指示该栅格中是否有点的指示值。The point cloud data processing module 301 is used to rasterize the collected laser point cloud data, and extract features for each grid to obtain three-dimensional lattice data, and the three-dimensional lattice data includes four eigenvalues corresponding to each grid (obtained by extracting four features for each grid), the four feature values of each grid are: the maximum height of the point cloud in the grid, the reflection intensity of the point cloud with the maximum height, and The number of point clouds, and an indicator value indicating whether points are in this raster.

特征图生成模块302用于对三维点阵数据进行三维卷积和三维降采样，以得到三维特征图。The feature map generation module 302 is used to perform three-dimensional convolution and three-dimensional down-sampling on the three-dimensional lattice data to obtain a three-dimensional feature map.

候选框生成模块303用于对应三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从三维检测框中选出候选三维检测框。The candidate frame generation module 303 is used to generate multiple 3D detection frames with the same height corresponding to each position of the 3D feature map, and select candidate 3D detection frames from the 3D detection frames.

候选框生成模块303具体可以用于：对应三维特征图的每一位置生成多个三维检测框，以确定各三维检测框属于前景的概率、位置信息和尺度信息，各三维检测框的尺度信息包括相同高度；利用非极大值抑制算法对各三维检测框去重，从去重后的各三维检测框中选出属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框。The candidate frame generating module 303 can be specifically used to: generate a plurality of three-dimensional detection frames corresponding to each position of the three-dimensional feature map, so as to determine the probability, position information and scale information of each three-dimensional detection frame belonging to the foreground, and the scale information of each three-dimensional detection frame includes The same height; use the non-maximum value suppression algorithm to deduplicate each 3D detection frame, and select a preset number of 3D detection frames with the highest probability belonging to the foreground from each 3D detection frame after deduplication, as a candidate 3D detection frame.

ROI降采样模块304用于对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图。The ROI downsampling module 304 is used to perform ROI downsampling on the length and width dimensions of the 3D feature map corresponding to each candidate 3D detection frame, so as to obtain feature maps of the same size corresponding to each 3D detection frame.

检测模块305用于根据对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。The detection module 305 is used to perform classification and regression processing according to the feature maps of the same size corresponding to each three-dimensional detection frame, so as to determine the category and location information of the detection target.

候选框生成模块303可以包括前景确定子模块，用于：将三维检测框映射到二维平面上，得到与三维检测框对应的第一二维检测框；将预设长方体映射到二维平面上，得到与预设长方体对应的矩形框，预设长方体为经预先标注的检测目标样本；根据第一二维检测框与矩形框的交并比，确定三维检测框属于前景的概率。The candidate frame generation module 303 may include a foreground determination submodule, configured to: map the three-dimensional detection frame onto a two-dimensional plane to obtain a first two-dimensional detection frame corresponding to the three-dimensional detection frame; map a preset cuboid onto a two-dimensional plane , to obtain a rectangular frame corresponding to a preset cuboid, which is a pre-marked detection target sample; according to the intersection ratio between the first two-dimensional detection frame and the rectangular frame, determine the probability that the three-dimensional detection frame belongs to the foreground.

检测模块305具体可以用于：对每一候选三维检测框对应的三维特征图，基于长度和宽度维度，得到四个第二二维检测框对应的特征图；对各第二二维检测框对应的特征图进行ROI降采样，得到各第二二维检测框对应的相同大小特征图；将各第二二维检测框对应的相同大小特征图，按照对应的候选三维检测框，组合为对应各三维检测框的相同大小特征图。The detection module 305 can specifically be used to: for the three-dimensional feature map corresponding to each candidate three-dimensional detection frame, based on the length and width dimensions, obtain the feature map corresponding to four second two-dimensional detection frames; The feature map of the ROI is down-sampled to obtain the same size feature map corresponding to each second two-dimensional detection frame; the same size feature map corresponding to each second two-dimensional detection frame is combined according to the corresponding candidate three-dimensional detection frame. Same-size feature maps of 3D detection boxes.

基于激光点云的目标检测装置300还可以包括训练模块，用于通过OHEM训练方法训练特征图生成模块302、候选框生成模块303、ROI降采样模块304和检测模块305。The object detection device 300 based on the laser point cloud can also include a training module for training the feature map generation module 302 , the candidate frame generation module 303 , the ROI downsampling module 304 and the detection module 305 through the OHEM training method.

本发明的基于激光点云的目标检测装置300可以基于上述构建的目标检测模型来实现，具体地，在通过点云数据处理模块301得到三维点阵数据之后，可以将该三维点阵数据作为目标检测模型的输入，可以通过目标检测模型中的三维卷积神经网络实现特征图生成模块302的相应功能。可以通过第一分类层和第一回归层实现候选框生成模块303的生成具有相同高度的多个三维检测框的功能，再选择分类层输出的属于前景的概率最大的预设数量的三维检测框，作为候选三维检测框，从而实现候选框生成模块303的选出候选三维检测框的功能。通过ROI降采样层实现ROI降采样模块304的相应功能，或者，通过ROI降采样层与两个三维卷积层结合，来实现ROI降采样模块304的相应功能。通过第二分类层和第二回归层实现检测模块305的相应功能。因此，上述的训练模块还可用于训练本发明实施例的目标检测模型的各层。由于上文已经详细介绍了OHEM训练方法，此处不再赘述。The target detection device 300 based on the laser point cloud of the present invention can be realized based on the target detection model constructed above, specifically, after the three-dimensional lattice data is obtained by the point cloud data processing module 301, the three-dimensional lattice data can be used as the target The input of the detection model can realize the corresponding function of the feature map generation module 302 through the three-dimensional convolutional neural network in the target detection model. The function of the candidate frame generation module 303 to generate multiple three-dimensional detection frames with the same height can be realized through the first classification layer and the first regression layer, and then select the preset number of three-dimensional detection frames output by the classification layer with the highest probability of belonging to the foreground , as the candidate 3D detection frame, so as to realize the function of selecting the candidate 3D detection frame of the candidate frame generation module 303 . The corresponding functions of the ROI down-sampling module 304 are realized through the ROI down-sampling layer, or the corresponding functions of the ROI down-sampling module 304 are realized through the combination of the ROI down-sampling layer and two three-dimensional convolution layers. The corresponding functions of the detection module 305 are realized through the second classification layer and the second regression layer. Therefore, the above-mentioned training module can also be used to train each layer of the object detection model of the embodiment of the present invention. Since the OHEM training method has been introduced in detail above, it will not be repeated here.

另外，在本发明实施例中基于激光点云的目标检测装置的具体实施内容，在上面所述基于激光点云的目标检测方法中已经详细说明了，故在此重复内容不再说明。In addition, the specific implementation content of the target detection device based on laser point cloud in the embodiment of the present invention has been described in detail in the above-mentioned target detection method based on laser point cloud, so the repeated content will not be described here.

图4示出了可以应用本发明实施例的基于激光点云的目标检测方法或基于激光点云的目标检测装置的示例性系统架构400。FIG. 4 shows an exemplary system architecture 400 that can be applied to a laser point cloud-based target detection method or a laser point cloud-based target detection device according to an embodiment of the present invention.

如图4所示，系统架构400可以包括终端设备401、402、403，网络404和服务器405。网络404用以在终端设备401、402、403和服务器405之间提供通信链路的介质。网络404可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 4 , the system architecture 400 may include terminal devices 401 , 402 , 403 , a network 404 and a server 405 . The network 404 is used as a medium for providing communication links between the terminal devices 401 , 402 , 403 and the server 405 . Network 404 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备401、402、403通过网络404与服务器405交互，以接收或发送消息等。终端设备401、402、403上可以安装有各种通讯客户端应用，例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 401 , 402 , 403 to interact with server 405 via network 404 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 401, 402, 403, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备401、402、403可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 401, 402, 403 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.

服务器405可以是提供各种服务的服务器，例如对用户利用终端设备401、402、403所浏览的购物类网站提供支持的后台管理服务器。��台管理服务器可以对接收到的目标信息查询请求等数据进行分析等处理，并将处理结果(例如目标信息)反馈给终端设备。The server 405 may be a server that provides various services, such as a background management server that provides support for shopping websites browsed by users using the terminal devices 401 , 402 , and 403 . The background management server can analyze and process the received data such as the target information query request, and feed back the processing results (such as target information) to the terminal device.

需要说明的是，本发明实施例所提供的基于激光点云的目标检测方法可以由服务器405或终端设备401、402、403执行，相应地，基于激光点云的目标检测装置可以设置于服务器405或终端设备401、402、403中。It should be noted that the target detection method based on the laser point cloud provided by the embodiment of the present invention can be executed by the server 405 or the terminal equipment 401, 402, 403, and correspondingly, the target detection device based on the laser point cloud can be set on the server 405 Or in the terminal equipment 401, 402, 403.

应该理解，图4中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 4 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

下面参考图5，其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统500的结构示意图。图5示出的终端设备或服务器仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring now to FIG. 5 , it shows a schematic structural diagram of a computer system 500 suitable for implementing a terminal device or a server according to an embodiment of the present application. The terminal device or server shown in FIG. 5 is only an example, and should not limit the functions and application scope of this embodiment of the present application.

如图5所示，计算机系统500包括中央处理单元(CPU)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储部分508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中，还存储有系统500操作所需的各种程序和数据。CPU 501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , a computer system 500 includes a central processing unit (CPU) 501 that can be programmed according to a program stored in a read-only memory (ROM) 502 or a program loaded from a storage section 508 into a random-access memory (RAM) 503 Instead, various appropriate actions and processes are performed. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501 , ROM 502 , and RAM 503 are connected to each other through a bus 504 . An input/output (I/O) interface 505 is also connected to the bus 504 .

以下部件连接至I/O接口505：包括键盘、鼠标等的输入部分506；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507；包括硬盘等的存储部分508；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器510上，以便于从其上读出的计算机程序根据需要被安装入存储部分508。The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, etc.; an output section 507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 508 including a hard disk, etc. and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 510 as necessary so that a computer program read therefrom is installed into the storage section 508 as necessary.

特别地，根据本发明公开的实施例，上文参考主要步骤示意图描述的过程可以被实现为计算机软件程序。例如，本发明公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行主要步骤示意图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分509从网络上被下载和安装，和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时，执行本申请的系统中限定的上述功能。In particular, according to the disclosed embodiments of the present invention, the processes described above with reference to the schematic diagrams of main steps can be implemented as computer software programs. For example, the disclosed embodiments of the present invention include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes program codes for executing the method shown in the schematic diagram of main steps. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 509 and/or installed from removable media 511 . When this computer program is executed by a central processing unit (CPU) 501, the above-mentioned functions defined in the system of the present application are performed.

需要说明的是，本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的主要步骤示意图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，主要步骤示意图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或主要步骤示意图中的每个方框、以及框图或主要步骤示意图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The schematic diagrams and block diagrams of main steps in the drawings illustrate the architecture, functions and operations of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the principal step diagram or block diagram may represent a module, program segment, or portion of code that contains one or more logic components for implementing the specified Executable instructions for a function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of the block diagrams or major step illustrations, and combinations of blocks in the block diagrams or major step illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or Implementation can be by a combination of special purpose hardware and computer instructions.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器包括点云数据处理模块301、特征图生成模块302、候选框生成模块303、ROI降采样模块304、检测模块305。其中，这些模块的名称在某种情况下并不构成对该模块本身的限定，例如，点云数据处理模块301还可以被描述为“用于将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据的模块”。The modules involved in the embodiments described in the present invention may be implemented by software or by hardware. The described modules can also be set in a processor, for example, can be described as: a processor includes a point cloud data processing module 301, a feature map generation module 302, a candidate frame generation module 303, an ROI downsampling module 304, a detection module 305. Wherein, the names of these modules do not constitute a limitation of the module itself in some cases, for example, the point cloud data processing module 301 can also be described as "for rasterizing the collected laser point cloud data, and A module for extracting features from each grid to obtain 3D lattice data".

作为另一方面，本发明还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的设备中所包含的；也可以是单独存在，而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该设备执行时，使得该设备包括：将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据；对所述三维点阵数据进行三维卷积和三维降采样，以得到三维特征图；对应所述三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从所述三维检测框中选出候选三维检测框；对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图；根据所述对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。As another aspect, the present invention also provides a computer-readable medium. The computer-readable medium may be contained in the device described in the above embodiments, or it may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by one of the devices, the device includes: rasterizing the collected laser point cloud data, and extracting each raster features to obtain 3D lattice data; perform 3D convolution and 3D downsampling on the 3D lattice data to obtain a 3D feature map; corresponding to each position of the 3D feature map, generate multiple 3D images with the same height detection frame, and select a candidate three-dimensional detection frame from the three-dimensional detection frame; for the three-dimensional feature map corresponding to each candidate three-dimensional detection frame, ROI downsampling is performed on the length and width dimensions to obtain the corresponding three-dimensional detection frame The feature map of the same size; performing classification and regression processing according to the feature map of the same size corresponding to each three-dimensional detection frame, so as to determine the category and position information of the detection target.

根据本发明实施例的��术方案，将采集的激光点云数据栅格化，并对每一栅格提取特征，以得到三维点阵数据；对三维点阵数据进行三维卷积和三维降采样，以得到三维特征图；对应三维特征图的每一位置，生成具有相同高度的多个三维检测框，并从三维检测框中选出候选三维检测框；对每一候选三维检测框对应的三维特征图，在长度和宽度维度上进行ROI降采样，以得到对应各三维检测框的相同大小特征图；根据对应各三维检测框的相同大小特征图进行分类和回归处理，以确定检测目标的类别和位置信息。能够不依赖于激光雷达与摄像��间的��，检测结果的准确性高。According to the technical solution of the embodiment of the present invention, the collected laser point cloud data is rasterized, and features are extracted from each grid to obtain three-dimensional lattice data; three-dimensional convolution and three-dimensional downsampling are performed on the three-dimensional lattice data, To obtain a three-dimensional feature map; corresponding to each position of the three-dimensional feature map, generate a plurality of three-dimensional detection frames with the same height, and select a candidate three-dimensional detection frame from the three-dimensional detection frame; for each candidate three-dimensional detection frame corresponding to the three-dimensional feature Figure, perform ROI down-sampling on the length and width dimensions to obtain feature maps of the same size corresponding to each 3D detection frame; perform classification and regression processing according to the feature map of the same size corresponding to each 3D detection frame to determine the category and location information. It can not rely on the calibration between the lidar and the camera, and the accuracy of the detection result is high.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，取决于设计要求和其他因素，可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above specific implementation methods do not constitute a limitation to the protection scope of the present invention. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A target detection method based on laser point cloud is characterized by comprising the following steps:

rasterizing the collected laser point cloud data, and extracting features of each grid to obtain three-dimensional dot matrix data;

performing three-dimensional convolution and three-dimensional down-sampling on the three-dimensional lattice data to obtain a three-dimensional characteristic diagram;

generating a plurality of three-dimensional detection frames with the same height corresponding to each position of the three-dimensional feature map, and selecting candidate three-dimensional detection frames from the three-dimensional detection frames;

performing ROI (region of interest) downsampling on the length dimension and the width dimension of the three-dimensional feature map corresponding to each candidate three-dimensional detection frame to obtain the feature maps with the same size corresponding to each three-dimensional detection frame;

and carrying out classification and regression processing according to the feature maps with the same size corresponding to the three-dimensional detection frames so as to determine the category and the position information of the detection target.

2. The method according to claim 1, wherein the step of generating a plurality of three-dimensional detection frames having the same height for each position of the three-dimensional feature map and selecting a candidate three-dimensional detection frame from the three-dimensional detection frames comprises:

generating a plurality of three-dimensional detection frames with the same height corresponding to each position of the three-dimensional characteristic diagram, and determining the probability that each three-dimensional detection frame belongs to the foreground;

and removing the duplication of the three-dimensional detection frames by using a non-maximum suppression algorithm, and selecting a preset number of three-dimensional detection frames with the highest probability of belonging to the foreground from the three-dimensional detection frames after the duplication removal as candidate three-dimensional detection frames.

3. The method of claim 2, wherein the probability that the three-dimensional detection box belongs to the foreground is determined by:

mapping the three-dimensional detection frame to a two-dimensional plane to obtain a first two-dimensional detection frame corresponding to the three-dimensional detection frame;

mapping a preset cuboid onto the two-dimensional plane to obtain a rectangular frame corresponding to the preset cuboid, wherein the preset cuboid is a pre-marked detection target sample;

and determining the probability of the three-dimensional detection frame belonging to the foreground according to the intersection ratio of the first two-dimensional detection frame and the rectangular frame.

4. The method of claim 1, wherein the step of performing ROI downsampling on the length and width dimensions of the three-dimensional feature map corresponding to each candidate three-dimensional detection frame to obtain the same-size feature map corresponding to each three-dimensional detection frame comprises:

obtaining the three-dimensional feature maps corresponding to the four second two-dimensional detection frames based on the length dimension and the width dimension of the three-dimensional feature map corresponding to each candidate three-dimensional detection frame;

performing ROI (region of interest) downsampling on the feature map corresponding to each second two-dimensional detection frame to obtain the feature map with the same size corresponding to each second two-dimensional detection frame;

and combining the feature maps with the same size corresponding to the second two-dimensional detection frames into the feature maps with the same size corresponding to the three-dimensional detection frames according to the corresponding candidate three-dimensional detection frames.

5. A target detection device based on laser point cloud is characterized by comprising:

the point cloud data processing module is used for rasterizing the acquired laser point cloud data and extracting features of each grid to obtain three-dimensional dot matrix data;

the characteristic diagram generating module is used for carrying out three-dimensional convolution and three-dimensional down-sampling on the three-dimensional lattice data to obtain a three-dimensional characteristic diagram;

the candidate frame generation module is used for generating a plurality of three-dimensional detection frames with the same height corresponding to each position of the three-dimensional feature map and selecting a candidate three-dimensional detection frame from the three-dimensional detection frames;

the ROI down-sampling module is used for carrying out ROI down-sampling on the length dimension and the width dimension of the three-dimensional feature map corresponding to each candidate three-dimensional detection frame so as to obtain the feature map with the same size corresponding to each three-dimensional detection frame;

and the detection module is used for carrying out classification and regression processing according to the feature maps with the same size corresponding to the three-dimensional detection frames so as to determine the category and the position information of the detection target.

6. The apparatus of claim 5, wherein the candidate box generation module is further configured to:

7. The apparatus of claim 6, wherein the candidate block generation module comprises a foreground determination sub-module configured to:

8. The apparatus of claim 5, wherein the detection module is further configured to:

9. The apparatus of claim 5, wherein the laser point cloud based target detection apparatus further comprises a training module configured to:

training the feature map generation module, the candidate box generation module, the ROI downsampling module and the detection module by an OHEM training method.

10. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.