CN107529650A

CN107529650A - Network model construction and closed loop detection method, corresponding device and computer equipment

Info

Publication number: CN107529650A
Application number: CN201710700709.6A
Authority: CN
Inventors: 阳方平
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2017-08-16
Filing date: 2017-08-16
Publication date: 2018-01-02
Anticipated expiration: 2037-08-16
Also published as: CN107529650B

Abstract

The invention discloses a network model construction and closed-loop detection method, a corresponding device and computer equipment. The closed-loop detection method includes: inputting the currently captured real-scene image frame into the target network model constructed based on the network model construction method of the present invention to obtain the actual image features of the real-scene image frame; Match the image frame and the corresponding historical image features; based on the similarity value between the actual image feature and each historical image feature, determine the closed-loop detection result of the real-scene image frame. Using the above method, under the premise of ensuring the accuracy of closed-loop detection, the dimension of the image feature vector required in closed-loop detection can be effectively reduced, thereby shortening the calculation time of similarity calculation in closed-loop detection, so that the closed-loop detection can be better satisfied. Real-time requirements in testing.

Description

Construction of network model and closed-loop detection method, corresponding device and computer equipment

技术领域technical field

本发明涉及机器学习技术领域，尤其涉及网络模型的构建和闭环检测方法、相应装置及计算机设备。The invention relates to the technical field of machine learning, in particular to a network model construction and closed-loop detection method, a corresponding device and computer equipment.

背景技术Background technique

图像特征提取是计算机视觉领域中图像处理的一个重要技术环节，传统的图像特征提取方法对光照变化十分敏感，对不同光照环境下捕获的同一场景图像进行特征提取时，往往会出现不同的结果，由此影响后续的图像处理性能。Image feature extraction is an important technical link in image processing in the field of computer vision. Traditional image feature extraction methods are very sensitive to illumination changes. When performing feature extraction on the same scene image captured under different lighting environments, different results often appear. This affects subsequent image processing performance.

在上述缺陷的基础上，技术人员提出了基于深度学习模型进行图像特征提取的方法，深度学习模型尽管能有效避免复杂光照对图像特征的影响，但现有技术中提出的深度学习模型所输出图像特征的特征维度往往较高(如经典的PlaceCNN卷积网络模型输出的图像特征维度高达9126维)，高纬度的图像特征也极大影响图像处理的计算时间，降低图像处理性能。On the basis of the above defects, technicians proposed a method for image feature extraction based on a deep learning model. Although the deep learning model can effectively avoid the influence of complex lighting on image features, the image output by the deep learning model proposed in the prior art The feature dimension of the feature is often high (for example, the image feature dimension output by the classic PlaceCNN convolutional network model is as high as 9126 dimensions), and the high-latitude image feature also greatly affects the calculation time of image processing and reduces the performance of image processing.

此外，闭环检测可看作计算机视觉应用中的一个常见的图像处理问题，闭环检测时，如果首先基于现有的深度学习模型提取了高纬度的图像特征，则高纬度的图像特征大大影响了闭环检测中后续相似性度量的计算时间，进行后续图像处理(如闭环检测)时，大大影响了图像处理(闭环检测)的处理时间，由此难易满足实时进行闭环检测的要求。In addition, loop closure detection can be regarded as a common image processing problem in computer vision applications. During loop closure detection, if high-latitude image features are first extracted based on the existing deep learning model, the high-latitude image features will greatly affect the loop closure. The calculation time of subsequent similarity measures in the detection greatly affects the processing time of image processing (such as closed-loop detection) when performing subsequent image processing (such as closed-loop detection), so it is difficult to meet the requirements of real-time closed-loop detection.

发明内容Contents of the invention

本发明实施例提供了网络模型的构建和闭环检测方法、相应装置及计算机设备，实现了网络模型构建，所构建的网络模型能够输出低维度的图像特征，且输出的图像特征能够实现图像的实时闭环检测。Embodiments of the present invention provide network model construction and closed-loop detection methods, corresponding devices, and computer equipment, and realize network model construction. The constructed network model can output low-dimensional image features, and the output image features can realize real-time images. Closed loop detection.

第一方面，本发明实施例提供了一种网络模型的构建方法，包括：In the first aspect, the embodiment of the present invention provides a method for constructing a network model, including:

基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括以下至少之一：卷积层的��数、池化层的层数、全连接层的层数、以及各层之间的拓扑连接顺序；所述配置参数信息包括以下至少之一：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗��大小、以及各全连接层的神经元数量；Based on the obtained topology information and configuration parameter information, construct an initial network model, wherein the topology information includes at least one of the following: the number of layers of the convolutional layer, the number of layers of the pooling layer, and the number of layers of the fully connected layer , and the topological connection sequence between the layers; the configuration parameter information includes at least one of the following: the convolution step size and the convolution kernel size and quantity of each convolution layer, the pooling step size and the pooling step size of each pooling layer Optimize the window size and the number of neurons in each fully connected layer;

根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。According to the obtained training and learning information, the initial network model is iteratively trained to obtain a target network model with a standard weight data set.

第二方面，本发明实施例提供了一种闭环检测方法，包括：In a second aspect, an embodiment of the present invention provides a closed-loop detection method, including:

将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，所述目标网络模型基于本发明第一方面实施例提供的网络模型的构建方法确定；Inputting the currently captured real-scene image frame into a preset target network model to obtain actual image features of the real-scene image frame, and the target network model is determined based on the network model construction method provided in the embodiment of the first aspect of the present invention;

根据设定的图像帧选取规则，确定所述实景图像帧的至少一个待匹配图像帧，并获取各待匹配图像帧的历史图像特征；According to the set image frame selection rules, determine at least one image frame to be matched in the real-scene image frame, and acquire the historical image features of each image frame to be matched;

基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。Based on the similarity value between the actual image feature and each historical image feature, the closed-loop detection result of the real-scene image frame is determined.

第三方面，本发明实施例提供了一种网络模型的构建装置，包括：In a third aspect, an embodiment of the present invention provides an apparatus for constructing a network model, including:

初始构建模块，用于基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括以下至少之一：卷积层的层数、池化层的层数、全连接层的层数以及各层之间的拓扑连接顺序；所述配置参数信息包括以下至少之一：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗口大小、以及各全连接层的神经元数量；An initial construction module, configured to construct an initial network model based on the obtained topology information and configuration parameter information, wherein the topology information includes at least one of the following: the number of layers of the convolutional layer, the number of layers of the pooling layer, The number of layers of the fully connected layer and the topological connection sequence between the layers; the configuration parameter information includes at least one of the following: the convolution step size of each convolution layer and the size and number of convolution kernels, the pool of each pooling layer Step size and pooling window size, and the number of neurons in each fully connected layer;

目标确定模块，用于根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。The target determination module is configured to iteratively train the initial network model according to the acquired training and learning information, and obtain a target network model with a standard weight data set.

第四方面，本发明实施例提供了一种闭环检测装置，包括：In a fourth aspect, an embodiment of the present invention provides a closed-loop detection device, including:

特征提取模块，用于将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，所述目标网络模型基于本发明第三方面实施例提供的网络模型的构建装置确定；A feature extraction module, configured to input the currently captured real-scene image frame into a preset target network model to obtain actual image features of the real-scene image frame, and the target network model is based on the network model provided by the embodiment of the third aspect of the present invention Build device determination;

图像选取模块，用于根据设定的图像帧选取规则，确定所述实景图像帧的至少一个待匹配图像帧，并获取各待匹配图像帧的历史图像特征；An image selection module, configured to determine at least one image frame to be matched of the real-scene image frame according to the set image frame selection rule, and obtain historical image features of each image frame to be matched;

检测确定模块，用于基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。The detection and determination module is configured to determine the closed-loop detection result of the real-scene image frame based on the similarity value between the actual image feature and each historical image feature.

第五方面，本发明实施例提供了一种计算机设备，包括：In a fifth aspect, an embodiment of the present invention provides a computer device, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序；storage means for storing one or more programs;

所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现本发明第一方面实施例提供的网络模型的构建方法。The one or more programs are executed by the one or more processors, so that the one or more processors implement the network model construction method provided in the embodiment of the first aspect of the present invention.

第六方面，本发明实施例提供了一种计算机设备，包括：摄像头，用于捕获图像帧，还包括：In a sixth aspect, an embodiment of the present invention provides a computer device, including: a camera for capturing image frames, and also includes:

一个或多个处理器；one or more processors;

所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本发明第二方面实施例提供的闭环检测方法。The one or more programs are executed by the one or more processors, so that the one or more processors implement the closed-loop detection method provided in the embodiment of the second aspect of the present invention.

第七方面，本发明实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明第一方面实施例提供的网络模型的构建方法；In a seventh aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for constructing a network model provided in the embodiment of the first aspect of the present invention is implemented;

第八方面，本发明实施例提供了一种计算机可读存储介质，其上存储有计算机程序��该程序被处理器执行时实现了本发明第二方面实施例提供的闭环检测方法。In an eighth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the closed-loop detection method provided in the embodiment of the second aspect of the present invention is implemented.

在上述提供的网络模型的构建和闭环检测方法、相应装置及计算机设备中，网络模型的构建方法中首先基于获取预设的拓扑结构信息和配置参数信息，构建初始网络模型；然后根据获取的训练学习信息，训练获得具有标准权值数据集的目标网络模型。闭环检测方法中，首先将当前捕获的实景图像帧输入上述构建的目标网络模型中，获得相应的实际图像特征；然后根据设定的图像帧选取规则，确定该实景图像帧的至少一个待匹配图像帧及对应的历史图像特征；最终根据实际图像证与各历史图像帧的相似度值，确定实景图像帧的闭环检测结果。上述技术方案，所构建的目标网络模型能够快速、精简的输出低维度的图像特征向量，且用于闭环检测时，在保证闭环检测准确率的前提下，有效降低了闭环检测中所需图像特征向量的维度，进而缩短了闭环检测中相似度计算时的计算时间，由此能够较好的满足闭环检测中实时性的要求。In the network model construction and closed-loop detection method provided above, corresponding devices and computer equipment, in the network model construction method, the initial network model is first constructed based on the acquisition of preset topology information and configuration parameter information; then according to the obtained training Learning information, training to obtain the target network model with a standard weight data set. In the closed-loop detection method, first input the currently captured real-scene image frame into the target network model constructed above to obtain the corresponding actual image features; then determine at least one image to be matched for the real-scene image frame according to the set image frame selection rules Frames and corresponding historical image features; finally, according to the similarity value between the actual image certificate and each historical image frame, the closed-loop detection result of the real image frame is determined. The above technical solution, the constructed target network model can quickly and concisely output low-dimensional image feature vectors, and when used for closed-loop detection, on the premise of ensuring the accuracy of closed-loop detection, the required image features in closed-loop detection are effectively reduced. The dimension of the vector shortens the calculation time of the similarity calculation in the closed-loop detection, thereby better meeting the real-time requirements in the closed-loop detection.

附图说明Description of drawings

图1a为本发明实施例一提供的一种网络模型的构建方法的流程示意图；Fig. 1a is a schematic flowchart of a method for constructing a network model provided by Embodiment 1 of the present invention;

图1b给出了本发明实施例一所构建网络模型中Convx_1的拓扑结构图；Figure 1b shows the topology structure diagram of Convx_1 in the network model constructed in Embodiment 1 of the present invention;

图1c给出了本发明实施例一所构建网络模型中Convx_2的拓扑结构图；Fig. 1c shows the topology structure diagram of Convx_2 in the network model constructed in Embodiment 1 of the present invention;

图1d给出了本发明实施例一所提供C.ReLU计算函数的计算原理示意图；Figure 1d shows a schematic diagram of the calculation principle of the C.ReLU calculation function provided by Embodiment 1 of the present invention;

图1e给出了本发明实施例一所训练目标网络模型的拓扑示意图；FIG. 1e shows a schematic topology diagram of a training target network model according to an embodiment of the present invention;

图1f～图1m分别给出了本发明实施例一所构建目标网络模型中各层输出结果的可视化图；Figures 1f to 1m respectively show the visualization diagrams of the output results of each layer in the target network model constructed in Embodiment 1 of the present invention;

图2为本发明实施例二提供的一种闭环检测方法的流程示意图；FIG. 2 is a schematic flowchart of a closed-loop detection method provided in Embodiment 2 of the present invention;

图3为本发明实施例三提供的一种闭环检测方法的流程示意图；FIG. 3 is a schematic flowchart of a closed-loop detection method provided by Embodiment 3 of the present invention;

图4a为本发明实施例四提供的一种网络模型的构建装置的结构框图；FIG. 4a is a structural block diagram of a device for constructing a network model provided in Embodiment 4 of the present invention;

图4b为本发明实施例四提供的一种计算机设备的硬件结构示意图；FIG. 4b is a schematic diagram of a hardware structure of a computer device provided by Embodiment 4 of the present invention;

图5a为本发明实施例五提供的一种闭环检测装置的结构框图；FIG. 5a is a structural block diagram of a closed-loop detection device provided in Embodiment 5 of the present invention;

图5b为本发明实施例五提供一种计算机设备的硬件结构示意图。FIG. 5b is a schematic diagram of a hardware structure of a computer device provided by Embodiment 5 of the present invention.

具体实施方式detailed description

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

实施例一Embodiment one

图1a为本发明实施例一提供的一种网络模型的构建方法的流程示意图，该方法适用于对新的网络模型进行构建和训练的情况，该方法可以由网络模型的构建装置执行，其中，该装置可由软件和/或硬件实现，并一般集成计算机设备中。Fig. 1a is a schematic flowchart of a method for constructing a network model provided by Embodiment 1 of the present invention, the method is suitable for constructing and training a new network model, and the method can be executed by a device for constructing a network model, wherein, The device can be implemented by software and/or hardware, and is generally integrated into computer equipment.

如图1a所示，本发明实施例一提供的一种网络模型的构建方法，包括如下操作：As shown in Figure 1a, a method for constructing a network model provided by Embodiment 1 of the present invention includes the following operations:

S101、基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括以下至少之一：卷积层的层数、池化层的层数、全连接层的层数、以及各层之间的拓扑连接顺序；所述配置参数信息包括以下至少之一：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗口大小、以及各全连接层的神经元数量。S101. Construct and form an initial network model based on the acquired topology information and configuration parameter information, wherein the topology information includes at least one of the following: the number of convolutional layers, the number of pooling layers, and the number of fully connected layers The number of layers, and the topological connection order between each layer; the configuration parameter information includes at least one of the following: the convolution step size and the convolution kernel size and quantity of each convolution layer, and the pooling step size of each pooling layer And pooling window size, and the number of neurons in each fully connected layer.

在本实施例中，根据所提供的拓扑结构信息，可以确定一个网络模型框架，根据所提供的配置参数信息，可以将确定的网络模型框架构建形成一个可进行图像特征提取计算的初始网络模型。由于所述拓扑结构信息及配置参数信息均为预先设置的，所以可认为根据其构建形成的初始网络模型所具有的网络层数及各层之间的连接顺序和连接关系均与现有神经网络模型不同。In this embodiment, according to the provided topology information, a network model framework can be determined, and according to the provided configuration parameter information, the determined network model framework can be constructed to form an initial network model that can perform image feature extraction calculations. Since the topology information and configuration parameter information are all preset, it can be considered that the number of network layers and the connection order and connection relationship between the layers of the initial network model formed according to the construction are the same as those of the existing neural network. The models are different.

对于待构建的网络模型而言，其拓扑结构中除输入层和输出层外，还包括了卷积层、池化层以及全连接层，具体地，本实施例预设置的拓扑结构信息具体给出了各层的层数及整体各层的连接关系，如，拓扑结构信息中设置卷积层连接输入层，池化层连接卷积层，全连接层连接池化层或卷积层等，总之，基于上述拓扑结构信息可搭建形成一个网络模型框架。For the network model to be constructed, in addition to the input layer and output layer, its topological structure also includes a convolutional layer, a pooling layer, and a fully connected layer. Specifically, the topology information preset in this embodiment is specifically given to The number of layers of each layer and the overall connection relationship of each layer are shown. For example, in the topology information, the convolutional layer is connected to the input layer, the pooling layer is connected to the convolutional layer, and the fully connected layer is connected to the pooling layer or convolutional layer. In short, based on the above topology information, a network model framework can be constructed.

本实施例形成一个网络模型框架后，需要基于配置参数信息为网络模型框架中的各层提供实质的拓扑连接，由此形成能够进行图像特征计算的初始网络模型。本实施例中预设的配置参数信息具体包含了待构建的网络模型中各层的配置参数，且各层基于相应的配置参数，可实现相邻各层之间的实质的拓扑连接。After forming a network model framework in this embodiment, it is necessary to provide substantial topological connections for each layer in the network model framework based on configuration parameter information, thereby forming an initial network model capable of image feature calculation. The preset configuration parameter information in this embodiment specifically includes the configuration parameters of each layer in the network model to be constructed, and each layer can realize substantial topological connection between adjacent layers based on corresponding configuration parameters.

示例性地，网络模型中的卷积层可通过相应的配置参数与上一层(可能是输入层，也可能是池化层)建立能够进行卷积计算的卷积连接；池化层可通过对应的配置参数与上一层(一般为卷积层)建立能够进行池化计算的池化连接；全连接层与上一层(可能是卷积层、池化层或全连接层)建立能够进行全连接计算的全连接。Exemplarily, the convolutional layer in the network model can establish a convolutional connection capable of performing convolution calculations with the upper layer (maybe the input layer or the pooling layer) through corresponding configuration parameters; the pooling layer can pass The corresponding configuration parameters establish a pooling connection with the upper layer (generally a convolutional layer) that can perform pooling calculations; Full connection for full connection computation.

具体地，卷积层的配置参数包括：卷积核大小和数量、以及卷积步长，所述卷积核大小具体可理解为卷积层与上一层建立卷积连��后进行卷积计算时所采用卷积矩阵的大小；所述卷积核数量具体可理解为进行卷积计算时能够采用的不同卷积矩阵的数量；所述卷积步长具体可理解为卷积计算中卷积核由当前计算位置移动到下一计算位置时的移动幅度，如，卷积步长为1时相当于每次由当前计算位置到下一计算位置的移动幅度为1。Specifically, the configuration parameters of the convolution layer include: the size and number of convolution kernels, and the convolution step size. The size of the convolution kernel can be specifically understood as the convolution calculation after the convolution layer establishes a convolution connection with the previous layer. The size of the convolution matrix used at the time; the number of convolution kernels can be specifically understood as the number of different convolution matrices that can be used when performing convolution calculations; the convolution step size can be specifically understood as The movement range when the kernel moves from the current calculation position to the next calculation position, for example, when the convolution step size is 1, it is equivalent to the movement range from the current calculation position to the next calculation position is 1 each time.

同样，所述池化层的配置参数包括：池化窗口大小和池化步长，所述池化窗口具体可理解为池化层与上一层建立池化连接后进行池化计算时所采用池化矩阵的大小；所述池化步长具体可理解为池化计算中池化窗口由当前计算位置移动到下一计算位置时的移动幅度。此外，全连接层的配置参数包括神经元数量，所述神经元数量具体可用于确定与上一层建立全连接后所需全连接权值数据的总个数。Similarly, the configuration parameters of the pooling layer include: the pooling window size and the pooling step size, and the pooling window can be specifically understood as the pooling layer used when performing pooling calculations after establishing a pooling connection with the upper layer The size of the pooling matrix; the pooling step size can be specifically understood as the movement range when the pooling window moves from the current calculation position to the next calculation position in the pooling calculation. In addition, the configuration parameters of the fully connected layer include the number of neurons, and the number of neurons can be specifically used to determine the total number of fully connected weight data required after establishing a fully connected layer with the previous layer.

进一步地，所述卷积层包括5层卷积层，分别为第1卷积层、第2卷积层、第3卷积层、第4卷积层以及第5卷积层；所述池化层包括有2层，分别为第1池化层和第2池化层；所述全连接层包括有2层，分别为第1全连接层和第2全连接层；所述拓扑连接顺序表示为：输入层--第1卷积层--第1池化层--第2卷积层--第2池化层--第3卷积层--第4卷积层--第5卷积层--第1全连接层--第2全连接层--输出层。Further, the convolutional layer includes 5 convolutional layers, which are respectively the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer and the fifth convolutional layer; the pool The pooling layer includes 2 layers, respectively the first pooling layer and the 2nd pooling layer; the fully connected layer includes 2 layers, respectively the 1st fully connected layer and the 2nd fully connected layer; the topological connection order Expressed as: input layer - the first convolutional layer - the first pooling layer - the second convolutional layer - the second pooling layer - the third convolutional layer - the fourth convolutional layer - the first 5 convolutional layer--the first fully connected layer--the second fully connected layer--the output layer.

在本实施例中，优选设定了整个待构建网络模型框架中的隐藏层由五大层卷积层、两层池化层以及两层全连接层构成，且各层均具有相应的名称如第1卷积层、第1池化层、以及第1全连接层等；本实施例还给出了一个优选的拓扑连接顺序，基于该拓扑连接顺序，可以形成一个优选的初始网络模型框架。In this embodiment, it is preferable to set the hidden layer in the framework of the entire network model to be constructed to be composed of five large convolutional layers, two pooling layers, and two fully connected layers, and each layer has a corresponding name as shown in Section 1. 1 convolutional layer, the first pooling layer, and the first fully connected layer, etc.; this embodiment also provides a preferred topological connection sequence, based on the topological connection sequence, an optimal initial network model framework can be formed.

进一步地，所述第i卷积层包括第i_1卷积层和第i_2卷积层，其中，i的取值为3,4和5，且所述第i_1卷积层和第i_2卷积层的卷积计算均采用捷径连接；所述第i_j卷积层还包括：第i_j_1子卷积层、第i_j_2子卷积层以及第i_j_3子卷积层，其中，j的取值为1和2。Further, the i-th convolutional layer includes the i_1th convolutional layer and the i_2th convolutional layer, wherein the value of i is 3, 4 and 5, and the i_1th convolutional layer and the i_2th convolutional layer The convolution calculations of all adopt shortcut connections; the i_jth convolutional layer also includes: the i_j_1th sub-convolutional layer, the i_j_2th sub-convolutional layer and the i_j_3th sub-convolutional layer, wherein the value of j is 1 and 2 .

可以理解的是，本实施还进一步对第3卷积层、第4卷积层以及第5卷积层的拓扑结构进行了细化，具体地，所述第3卷积层、第4卷积层以及第5卷积层均分别包含了两个小的卷积层，且所包含的两个小的卷积层均分别包括了3个子卷积层，本实施例对网络模型拓扑结构的上述设计，可以有效减少网络模型所提取图像特征的维度。此外，所包含的两个小的卷积层进行卷积计算时采用了捷径连接，采用捷径连接的目的主要在于可以在对构建的网络模型进行权值数据的更新训练时，加快网络模型的训练收敛时间。It can be understood that this implementation further refines the topological structures of the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer, specifically, the third convolutional layer, the fourth convolutional layer layer and the fifth convolutional layer respectively include two small convolutional layers, and the two small convolutional layers included respectively include three sub-convolutional layers. The design can effectively reduce the dimensionality of the image features extracted by the network model. In addition, the two small convolutional layers included use a shortcut connection when performing convolution calculations. The purpose of using the shortcut connection is mainly to speed up the training of the network model when updating and training the weight data of the constructed network model. Convergence time.

在上述优化的技术上，本实施例为网络模型中的各层优选的设置了具体的配置参数。进一步地，所述第1卷积层的卷积步长优选为1��卷积核大小优选为5*5及卷积核数量优选为32；所述第2卷积层的卷积步长优选为1、卷积核大小为3*3及卷积核数量优选为64；所述第1池化层和第2池化层的池化步长均优选为2及池化窗口大小均优选为3*3；所述第1全连接层和第2全连接层的神经元数量均优选为512。In terms of the above optimization techniques, this embodiment preferably sets specific configuration parameters for each layer in the network model. Further, the convolution step of the first convolution layer is preferably 1, the convolution kernel size is preferably 5*5 and the number of convolution kernels is preferably 32; the convolution step of the second convolution layer is preferably It is 1, the size of the convolution kernel is 3*3 and the number of convolution kernels is preferably 64; the pooling steps of the first pooling layer and the second pooling layer are preferably 2 and the pooling window size is preferably 3*3; the number of neurons in the first fully connected layer and the second fully connected layer is preferably 512.

本实施例上述具体给出了待构建的网络模型中第1卷积层、第2卷积层、第1池化层、第2池化层、第1全连接层以及第2全连接层对应的配置参数。可以理解的是，由于所述第3卷积层、第4卷积层以及第5卷积层中分别由多个子卷积层组成，所以本实施例具体对各子卷积层设置了相应的配置参数，且本实施例给出了各子卷积层的优选配置参数。In this embodiment, the first convolutional layer, the second convolutional layer, the first pooling layer, the second pooling layer, the first fully connected layer, and the second fully connected layer correspond to each other in the network model to be constructed. configuration parameters. It can be understood that, since the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer are respectively composed of multiple sub-convolutional layers, this embodiment specifically sets corresponding sub-convolutional layers for each sub-convolutional layer Configuration parameters, and this embodiment gives the preferred configuration parameters of each sub-convolutional layer.

具体地，在上述优化的基础上，所述第3_1_1子卷积层和第3_2_1子卷积层的卷积步长均优选为1、卷积核大小均为1*1及卷积核数量均优选为96；所述第3_1_2子卷积层和所述第3_2_2子卷积层的卷积步长均优选为1、卷积核大小均为3*3及卷积核数量均优选为96；所述第3_1_3子卷积层和所述第3_2_3子卷积层的卷积步长分别优选为2和1、卷积核大小均为1*1及卷积核数量均优选为192。Specifically, on the basis of the above optimization, the convolution step size of the 3_1_1 sub-convolution layer and the 3_2_1 sub-convolution layer are preferably 1, the convolution kernel size is 1*1, and the number of convolution kernels is equal to 1. It is preferably 96; the convolution step size of the 3_1_2 sub-convolution layer and the 3_2_2 sub-convolution layer is preferably 1, the convolution kernel size is 3*3 and the number of convolution kernels is preferably 96; The convolution steps of the 3_1_3th sub-convolution layer and the 3_2_3th sub-convolution layer are preferably 2 and 1 respectively, the size of the convolution kernel is 1*1, and the number of convolution kernels is preferably 192.

进一步地，所述第4_1_1子卷积层和第4_2_1子卷积层的卷积步长均优选为1、卷积核大小均为1*1及卷积核数量均优选为128；所述第4_1_2子卷积层和所述第4_2_2子卷积层的卷积步长均优选为1、卷积核大小均为3*3及卷积核数量均优选为128；所述第4_1_3子卷积层和所述第4_2_3子卷积层的卷积步长分别优选为2和1、卷积核大小均为1*1及卷积核数量均优选为384。Further, the convolution step size of the 4_1_1 sub-convolution layer and the 4_2_1 sub-convolution layer is preferably 1, the convolution kernel size is 1*1, and the number of convolution kernels is preferably 128; the first The convolution steps of the 4_1_2 sub-convolution layer and the 4_2_2 sub-convolution layer are preferably 1, the convolution kernel size is 3*3 and the number of convolution kernels is preferably 128; the 4_1_3 sub-convolution The convolution steps of the layer and the 4_2_3th sub-convolution layer are preferably 2 and 1 respectively, the size of the convolution kernel is 1*1, and the number of convolution kernels is preferably 384.

进一步地，所述第5_1_1子卷积层和第5_2_1子卷积层的卷积步长均优选为1、卷积核大小均为1*1及卷积核数量均优选为256；所述第5_1_2子卷积层和所述第5_2_2子卷积层的卷积步长均优选为1、卷积核大小均为3*3及卷积核数量均优选为256；所述第5_1_3子卷积层和所述第5_2_3子卷积层的卷积步长分别优选为2和1、卷积核大小均为1*1及卷积核数量均优选为512。Further, the convolution step size of the 5_1_1 sub-convolution layer and the 5_2_1 sub-convolution layer is preferably 1, the convolution kernel size is 1*1, and the number of convolution kernels is preferably 256; the first The convolution steps of the 5_1_2 sub-convolution layer and the 5_2_2 sub-convolution layer are preferably 1, the convolution kernel size is 3*3 and the number of convolution kernels is preferably 256; the 5_1_3 sub-convolution The convolution steps of the layer and the 5_2_3th sub-convolution layer are preferably 2 and 1, respectively, the size of the convolution kernel is 1*1, and the number of convolution kernels is preferably 512.

本实施例结合上述对待构建的网络模型中各层配置参数的优选设定，提供了一个网络模型构建的参数信息表，其中，表1具体给了网络模型构建时优选设定的网络模型中各层对应的配置参数。This embodiment provides a parameter information table for network model construction in combination with the above-mentioned optimal setting of configuration parameters of each layer in the network model to be constructed. Layer-corresponding configuration parameters.

表1待构建的网络模型的参数信息列表Table 1 List of parameter information of the network model to be constructed

如表1所示，表中第1列表示待构建的网络模型中各层的层标识，同时隐含给出了层与层之间的拓扑连接顺序，在第1列中，Convx表示了网络模型中的第x卷积层，由此可发现Convx中x分别取3,4和5时，均包括了两个小的卷积层，Convx_1和Convx_2；在第2列中，具体给出了基于第1列中各层形成网络模型框架后，各层与相邻上一层实质的拓扑连接时具有的计算类型，如，卷积层与相邻上一层实质的拓扑连接时具有的计算类型为卷积计算；在第3列、第4列以及第5列中，具体给出了卷积层以及池化层对应的优选配置参数，其中，上述过滤器大小，��体相当于卷积层的卷积核大小和池化层的池化窗口大小，上述过滤器个数，具体相当于卷积层的卷积核数量；上述步长，具体相当于卷积层的卷积步长和池化层的池化步长；在第6列中，具体给出了基于前几列信息构建出的网络模型中每层计算后所对应输出结果的维度值，其中，输出层的365相当于输出层有365个神经元进行结果输出。同时需要说明的是，对于全连接层而言，经全连接计算后所对应输出结果的维度值实际上等同于其配置参数中设置的神经元数量，本实施例优选设定为512。As shown in Table 1, the first column in the table indicates the layer identification of each layer in the network model to be constructed, and at the same time implicitly gives the topological connection sequence between layers. In the first column, Convx represents the network The xth convolutional layer in the model, it can be found that when x in Convx takes 3, 4 and 5 respectively, it includes two small convolutional layers, Convx_1 and Convx_2; in the second column, it is specifically given After the network model framework is formed based on each layer in the first column, the type of calculation that each layer has when it is connected with the substantial topology of the adjacent previous layer, for example, the calculation that the convolutional layer has when it is connected with the substantial topology of the adjacent previous layer The type is convolution calculation; in the third column, the fourth column and the fifth column, the optimal configuration parameters corresponding to the convolutional layer and the pooling layer are given in detail, and the above-mentioned filter size is specifically equivalent to the convolutional layer The size of the convolution kernel and the size of the pooling window of the pooling layer, the number of filters mentioned above is specifically equivalent to the number of convolution kernels of the convolution layer; the above step is specifically equivalent to the convolution step of the convolution layer and the pooling The pooling step size of the layer; in the sixth column, the dimension value of the corresponding output result after calculation of each layer in the network model constructed based on the information in the previous columns is specifically given, among which, 365 in the output layer is equivalent to the output The layer has 365 neurons for result output. At the same time, it should be noted that for the fully connected layer, the dimension value of the corresponding output result after the fully connected calculation is actually equal to the number of neurons set in its configuration parameters, which is preferably set to 512 in this embodiment.

可以理解的是，上述Convx中x分别取3,4和5时，其所对应的两个小的卷积层中又分别包含了3个子卷积层，本实施例同样为各子卷积层设定了相应的配置参数。图1b给出了本发明实施例一所构建网络模型中Convx_1的拓扑结构图；图1c给出了本发明实施例一所构建网络模型中Convx_2的拓扑结构图。如图1b和图1c所示，本实施例默认x分别取3,4和5，Convx_1及Convx_2的拓扑结构图中主拓扑连接具体由3个卷积层组成，且卷积核大小分别为1×1，3×3和1×1，其中，1×1卷积核用于控制图像特征提取时的特征维度，可以减少3×3卷积核的输入和输出维度。It can be understood that when x in the above-mentioned Convx is 3, 4 and 5 respectively, the corresponding two small convolutional layers respectively contain 3 sub-convolutional layers. In this embodiment, each sub-convolutional layer is also The corresponding configuration parameters are set. FIG. 1 b shows the topology structure diagram of Convx_1 in the network model constructed in Embodiment 1 of the present invention; FIG. 1 c shows the topology structure diagram of Convx_2 in the network model constructed in Embodiment 1 of the present invention. As shown in Figure 1b and Figure 1c, the default values of x in this embodiment are 3, 4, and 5 respectively. The main topology connections in the topology diagrams of Convx_1 and Convx_2 are specifically composed of 3 convolutional layers, and the convolution kernel sizes are 1 ×1, 3×3 and 1×1, where the 1×1 convolution kernel is used to control the feature dimension of the image feature extraction, which can reduce the input and output dimensions of the 3×3 convolution kernel.

同时，由图1b和图1c还可以看出，Convx_1及Convx_2均采用了捷径连接110，即，相当于在Convx_1及Convx_2进行卷积计算时，其分别输入Convx_1及Convx_2的特征数据可以首先基于设定的拓扑结构进行卷积计算并得输出特征数据，然后将所得到的输出特征数据再次与输入的特征数据进行加和计算，并将加和计算后结果可作为Convx_1及Convx_2输出结果。需要说明的是，Convx_1中的捷径连接110还额外与一个1×1卷积核的卷积层进行了拓扑连接，上述额外增加的卷积层具体可用于确保参与加和计算的两组特征数据具有相同的维度，从而保证能够正常进行加和计算，且因为输入Convx_2的特征数据为Convx_1的输出特征数据，因Convx_1和Convx_2的输入和输出维度相同，所以Convx_2中无需额外添加1×1卷积核的卷积层。At the same time, it can also be seen from Fig. 1b and Fig. 1c that both Convx_1 and Convx_2 use a shortcut connection 110, that is, when Convx_1 and Convx_2 perform convolution calculations, the feature data input to Convx_1 and Convx_2 respectively can first be based on the setting Carry out convolution calculation with the given topology structure to obtain the output feature data, and then add the obtained output feature data to the input feature data again, and the result of the sum calculation can be used as the output result of Convx_1 and Convx_2. It should be noted that the shortcut connection 110 in Convx_1 is additionally connected topologically with a convolutional layer of a 1×1 convolutional kernel, and the above-mentioned additional convolutional layer can be specifically used to ensure that the two sets of feature data participating in the sum calculation Have the same dimension, so as to ensure that the sum calculation can be performed normally, and because the input feature data of Convx_2 is the output feature data of Convx_1, because the input and output dimensions of Convx_1 and Convx_2 are the same, there is no need to add 1×1 convolution to Convx_2 The kernel's convolutional layer.

此外，还可以发现，Convx_1及Convx_2拓扑连接中的每个卷积层后面都跟随有批量归一化(Batch Normalization��BN)，其进行BN操作的目的在于加快所构建网络模型的训练收敛速度。可以理解的是，本实施例在BN操作后还采用了Relu激活函数来增强各卷积层所输出结果的表达强度。In addition, it can also be found that each convolutional layer in the topological connections of Convx_1 and Convx_2 is followed by batch normalization (Batch Normalization, BN). The purpose of BN operation is to speed up the training convergence speed of the constructed network model. It can be understood that, in this embodiment, the Relu activation function is also used after the BN operation to enhance the expression strength of the output results of each convolutional layer.

需要说明的是，除上述图1b和图1c中给出的各卷积层后都跟随BN操作外，本实施例优选的为所构建网络模型中的各卷积层之后均设置BN操作，如在第1卷积层以及第2卷积层后同样设置BN操作，且所述BN操作具体在进入池化层前完成。其设置BN操作的目的同样在于加快所构建网络模型的收敛速度。It should be noted that, except that the BN operation is followed after each convolutional layer given in the above-mentioned Fig. 1b and Fig. 1c, it is preferred in this embodiment that the BN operation is set after each convolutional layer in the constructed network model, such as The BN operation is also set after the first convolutional layer and the second convolutional layer, and the BN operation is specifically completed before entering the pooling layer. The purpose of setting the BN operation is also to speed up the convergence speed of the constructed network model.

S102、根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。S102. According to the acquired training and learning information, iteratively train the initial network model to obtain a target network model with a standard weight data set.

在本实施例中，上述步骤可以根据预设的拓扑结构信息以及配置参数信息构建形成一个可进行图像特征提取的初始网络模型。可以理解的是，所构建的初始网络模型中，仅给出了各层实质的拓扑连接后能够进行的计算类型，并没有给出实质计算所需的权值数据，该权值数据可以指卷积层进行卷积计算时所采用卷积核中的具体取值，还可以值池化层进行池化计算时所采用池化窗口中的具体取值，或与全连接层的各神经元进行连接时的连接权重值，因此，基于所述初始网络模型并不能直接输出正确的图像特征结果。In this embodiment, the above steps may construct an initial network model capable of extracting image features according to preset topology information and configuration parameter information. It is understandable that in the initial network model constructed, only the types of calculations that can be performed after the substantial topological connections of each layer are given, and the weight data required for substantial calculations are not given. The weight data can refer to volume The specific value in the convolution kernel used in the convolution calculation of the product layer can also be the specific value in the pooling window used in the pooling calculation of the pooling layer, or it can be compared with each neuron in the fully connected layer. The connection weight value at the time of connection, therefore, based on the initial network model, the correct image feature result cannot be output directly.

本步骤的目的在于为初始网络模型中各层提供初始的权值数据，并经过相应的迭代更新步骤实现各层所对应权值数据的更新，最终获得各层均具有最佳权值数据的目标网络模型，以基于目标网络模型实现图像特征的准确提取。本步骤具体基于预先设定的训练学习信息进行网络模型的训练。The purpose of this step is to provide initial weight data for each layer in the initial network model, and through corresponding iterative update steps to update the corresponding weight data of each layer, and finally obtain the goal that each layer has the best weight data Network model to achieve accurate extraction of image features based on the target network model. This step specifically performs training of the network model based on preset training and learning information.

具体地，所述训练学习信息包括以下至少之一：输入图像样本集、激活函数、偏置数据、各卷积层中卷积核的初始权值数据和卷积函数、各池化层中池化窗口的初始权值数据和池化函数、各全连接层中神经元的初始权值数据以及输出分类函数；所述标准权值数据集包括以下至少之一：迭代训练后各卷积层中卷积核、各池化层中池化窗口以及各全连接层中神经元对应的标准权值数据。Specifically, the training and learning information includes at least one of the following: input image sample set, activation function, bias data, initial weight data and convolution function of the convolution kernel in each convolution layer, pooling in each pooling layer The initial weight data and the pooling function of the window, the initial weight data and the output classification function of the neurons in each fully connected layer; the standard weight data set includes at least one of the following: in each convolution layer after iterative training Convolution kernel, pooling windows in each pooling layer, and standard weight data corresponding to neurons in each fully connected layer.

在本实施实施例，所述输入图像样本集为场景识别领域大型的数据集Places365-Standard，该Places365-Standard数据集包含180多万张具有相应场景标识的场景图片，上述场景图片一共具有365个场景类别，即，每个场景图片的场景标识属于365个场景类别中的一种。所述激活函数具体可用来增加非线性因素，由此增强网络模型的表达能力，本实施例中的激活函数优选采用ReLu激活函数。所述偏置数据优选设置为0。In this embodiment, the input image sample set is Places365-Standard, a large data set in the field of scene recognition. The Places365-Standard data set contains more than 1.8 million scene pictures with corresponding scene identifications. There are 365 scene pictures in total. The scene category, that is, the scene identifier of each scene picture belongs to one of the 365 scene categories. The activation function can specifically be used to increase non-linear factors, thereby enhancing the expressive ability of the network model, and the activation function in this embodiment preferably adopts the ReLu activation function. The bias data is preferably set to 0.

此外，所述训练学习信息中还包括了各层计算所需的初始权值数据和计算函数，本实施例中优选采用Xavier初始算法为网络模型中的各层设置初始权值数据，同时，本实施例将第1卷积层和第2卷积层的卷积函数设定为C.ReLU计算函数；本实施例将第1池化层和第2池化层的池化函数设定为最大池化计算函数，以用于减少所连接卷积层输出数据的特征维度。In addition, the training and learning information also includes the initial weight data and calculation functions required for the calculation of each layer. In this embodiment, the Xavier initial algorithm is preferably used to set the initial weight data for each layer in the network model. At the same time, this In the embodiment, the convolution function of the first convolutional layer and the second convolutional layer is set as the C.ReLU calculation function; in this embodiment, the pooling function of the first pooling layer and the second pooling layer is set to the maximum Pooling calculation function to reduce the feature dimension of the output data of the connected convolutional layer.

对于C.ReLU计算函数而言，其计算原理为：基于设定的卷积核进行卷积计算得到当前的实际特征值，再将各实际特征值取反获得反特征值，��后对实际特征值与反特征值进行级联，之后使用ReLu激活函数进行非线性因素调整，最终获得的特征数据的维度值则相当于所设定卷积核数量的两倍。图1d给出了本发明实施例一所提供C.ReLU计算函数的计算原理示意图，图1d以图示的形式描述了C.ReLU计算函数的计算处理流程，即整个计算过程为先卷积，之后对卷积结果取反，然后将取反结果与卷积结果级联，最终通过ReLU激活函数处理，并输出维度值翻倍后的特征数据，For the C.ReLU calculation function, the calculation principle is: perform convolution calculation based on the set convolution kernel to obtain the current actual eigenvalue, then invert each actual eigenvalue to obtain the inverse eigenvalue, and then calculate the actual eigenvalue Concatenate with the anti-eigenvalue, and then use the ReLu activation function to adjust the nonlinear factors. The dimension value of the finally obtained feature data is equivalent to twice the number of convolution kernels set. Figure 1d shows a schematic diagram of the calculation principle of the C.ReLU calculation function provided by Embodiment 1 of the present invention. Figure 1d describes the calculation process of the C.ReLU calculation function in the form of a diagram, that is, the entire calculation process is convolution first, Afterwards, the convolution result is reversed, and then the reversed result is concatenated with the convolution result, and finally processed through the ReLU activation function, and the feature data after the dimension value is doubled is output.

示例性地，本实施例设定第1卷积层及第2卷积层均采用了C.ReLU计算函数进行卷积计算，如表1中所示，第1卷积层的卷积核数量为32，则其对应输出的特征数据的维度值为128*128*64，则可以理解的是，128*128为上一层输入的维度值，而普通的卷积计算后对应的维度值为128*128*32，但基于C.ReLU计算函数进行卷积计算后所输出特征数据的维度值则相当于之前的两倍。基于该种卷积计算方法，可以减少所构建网络模型中卷积计算的计算量，能够在保证计算结果准确性的前提下节省计算时间。Exemplarily, this embodiment sets that both the first convolutional layer and the second convolutional layer use the C.ReLU calculation function for convolution calculation, as shown in Table 1, the number of convolution kernels of the first convolutional layer is 32, the dimension value of the corresponding output feature data is 128*128*64, it can be understood that 128*128 is the dimension value of the input of the previous layer, and the corresponding dimension value after ordinary convolution calculation is 128*128*32, but the dimension value of the output feature data after convolution calculation based on the C.ReLU calculation function is twice as large as before. Based on this convolution calculation method, the calculation amount of convolution calculation in the constructed network model can be reduced, and the calculation time can be saved under the premise of ensuring the accuracy of the calculation results.

此外，本实施例将输出层确定为一个分类函数，以用于通过计算所得的图像特征实现图像场景的分类，由此通过输出的场景分类结果调整所构建初始网络模型中各层对应的权值数据(可能是初始权值数据，也可能是经过调整的待调整权值数据)，然后根据调整后的权值数据再次获得新的场景分类结果，如此循环往复直至达到循环结束条件。In addition, in this embodiment, the output layer is determined as a classification function, which is used to realize the classification of image scenes through the calculated image features, thereby adjusting the corresponding weights of each layer in the constructed initial network model through the output scene classification results data (which may be the initial weight data, or the adjusted weight data to be adjusted), and then obtain a new scene classification result again according to the adjusted weight data, and so on until the loop end condition is reached.

本实施例中的输出分类采用Sofmax分类器，其对应的输出分类结果为每个样本图像x属于每个场景类别j的概率p(y＝j|x)。具体来说，对于第i个样本图像x⁽ⁱ⁾，所对应的分类函数h(x⁽ⁱ⁾)可表示为：The output classification in this embodiment adopts a Sofmax classifier, and its corresponding output classification result is the probability p(y=j|x) that each sample image x belongs to each scene category j. Specifically, for the i-th sample image x ⁽ⁱ⁾ , the corresponding classification function h(x ⁽ⁱ⁾ ) can be expressed as:

其中，θ表示所构建初始网络模型中各层的权值数据构成的权值数据参数矩阵；k为分类数；y为类别标记向量。Among them, θ represents the weight data parameter matrix composed of the weight data of each layer in the initial network model constructed; k is the number of categories; y is the category label vector.

示例性地，在将样本图像x⁽ⁱ⁾输入初始网络模型前，本实施例优选对其进行灰度化、减均值以及补白的预处理并将其缩小为128*128，处理后的样本图像输入初始网络模型后，最终会得到一个k维的概率向量p，进而通过公式Exemplarily, before inputting the sample image x ⁽ⁱ⁾ into the initial network model, this embodiment preferably performs grayscale, mean subtraction and padding preprocessing on it and reduces it to 128*128, the processed sample image After inputting the initial network model, a k-dimensional probability vector p will eventually be obtained, and then through the formula

可预测x⁽ⁱ⁾的场景类别 class of scenes for which x ⁽ⁱ⁾ can be predicted

此外，本实施例��更新迭代训练初始网络模型，基于初始网络模型中各层当前的权值数据构成了权值数据参数矩阵θ，同时为该权值数据参数矩阵θ设计了损失函数L(θ)，L(θ)用于确定基于当前的权值数据参数矩阵θ进行图像特征计算时对应的损失值，其中，该损失函数L(θ)表示为：In addition, in order to update and iteratively train the initial network model in this embodiment, a weight data parameter matrix θ is formed based on the current weight data of each layer in the initial network model, and a loss function L(θ ), L(θ) is used to determine the corresponding loss value when calculating image features based on the current weight data parameter matrix θ, where the loss function L(θ) is expressed as:

其中，m为所选定样本图像集中的样本图像数量；θ表示权值数据参数矩阵；x⁽ⁱ⁾表示样本图像集中的第i个样本图像；y⁽ⁱ⁾表示第i个样本图像的实际场景类别；k表示场景分类数。Among them, m is the number of sample images in the selected sample image set; θ represents the weight data parameter matrix; x ⁽ⁱ⁾ represents the i-th sample image in the sample image set; y ⁽ⁱ⁾ represents the actual value of the i-th sample image Scene category; k represents the number of scene categories.

在本实施例中，确定训练所需的训练学习信息后，具体给出了将初始网络模型训练为目标网络模型的实现步骤：In this embodiment, after determining the training and learning information required for training, the implementation steps of training the initial network model as the target network model are specifically given:

1)在180多万张场景图片中随机选取设定数量的样本图像作为样本图像集；1) Randomly select a set number of sample images from more than 1.8 million scene images as the sample image set;

2)将选取的样本图像集输入初始网络模型，并经上述公式(1)和公式(2)通过输出层输出各样本图像当前对应的实际场景类别；2) Input the selected sample image set into the initial network model, and output the current corresponding actual scene category of each sample image through the output layer through the above formula (1) and formula (2);

3)根据上述公式(3)及各样本图像当前第t-1次迭代对应的实际场景类别，确定当前的权值数据参数矩阵θ_t-1对应的损失值；3) According to the above formula (3) and the actual scene category corresponding to the current t-1 iteration of each sample image, determine the loss value corresponding to the current weight data parameter matrix θ _t-1 ;

4)通过待冲量的随机梯度下降更新当前的权值数据参数矩阵θ_t-1。4) Update the current weight data parameter matrix θ _t-1 through stochastic gradient descent of the impulse to be expected.

具体地，通过V_t＝λ·V_t-1-η·▽L(θ_t-1) (4)Specifically, by V _t =λ·V _t-1 -η·▽L(θ _t-1 ) (4)

和θ_t＝θ_t-1+V_t (5)and θ _t = θ _t-1 +V _t (5)

实现权值数据参数矩阵θ_t-1的更新，其中，t表示对权值数据参数矩阵的第t次迭代；λ为冲量系数，优选为0.9，V_t为权值数据参数矩阵的第t次迭代时对应的更新值，θt为第t次迭代时的权值数据参数矩阵，η为学习率，初始优选为0.01，▽L(θ_t-1)表示L(θ_t-1)对θ_t-1的导数值。Realize the update of the weight data parameter matrix θ _t-1 , wherein, t represents the tth iteration of the weight data parameter matrix; λ is the impulse coefficient, preferably 0.9, V _t is the tth time of the weight data parameter matrix The corresponding update value during iteration, θt is the weight data parameter matrix at the tth iteration, η is the learning rate, the initial preference is 0.01, ▽L(θ _t-1 ) means L(θ _t-1 ) vs θ _t Derivative value of _-1 .

基于上述步骤，可以在迭代收敛前，迭代更新初始神经网络模型的权值数据参数矩阵，以获得下一次迭代所需权值数据参数矩阵。由此可以在迭代收敛时，获得最终的标准权值数据参数矩阵，且根据标准权值数据参数矩阵形成了训练学习后的目标网络模型。Based on the above steps, before the iteration converges, the weight data parameter matrix of the initial neural network model can be iteratively updated to obtain the weight data parameter matrix required for the next iteration. In this way, the final standard weight data parameter matrix can be obtained during iteration convergence, and the target network model after training and learning is formed according to the standard weight data parameter matrix.

需要说明的是，标准权值数据参数矩阵中的所有值构成了标准权值数据集，而标准权值数据集中的具体包括了迭代训练后各卷积层中卷积核、各池化层中池化窗口以及各全连接层中神经元对应的标准权值数据。即，相当于初始网络模型基于各卷积层中卷积核、各池化层中池化窗口以及各全连接层中神经元对应的标准权值数据形成了可准确进行图像特征提取的目标网络模型。It should be noted that all the values in the standard weight data parameter matrix constitute the standard weight data set, and the standard weight data set specifically includes the convolution kernel in each convolution layer after iterative training, and the data in each pooling layer. Pooling window and standard weight data corresponding to neurons in each fully connected layer. That is to say, the initial network model forms a target network that can accurately extract image features based on the standard weight data corresponding to the convolution kernel in each convolutional layer, the pooling window in each pooling layer, and the neurons in each fully connected layer. Model.

本实施例训练后的目标网络模型，对Places365-Standard数据集中的场景图像进行图像场景识别时，其所获的1个实际场景类别直接等同于场景图像所具有场景标识的准确率可以达到50.16％，而所获的5个候选场景类别中包括了场景图像所具有场景标识的准确率可以达到80.03％。When the target network model trained in this embodiment performs image scene recognition on the scene images in the Places365-Standard data set, the accuracy rate of one actual scene category obtained is directly equivalent to the scene identification of the scene image can reach 50.16%. , and the accuracy rate of the scene identification of the scene image in the obtained five candidate scene categories can reach 80.03%.

图1e给出了本发明实施例一所训练目标网络模型的拓扑示意图，如图1e所示，整个拓扑结构总体包括了输入层--第1卷积层--第1池化层--第2卷积层--第2池化层--第3卷积层--第4卷积层--第5卷积层--第1全连接层--第2全连接层--输出层，其输入层输入的是一幅128*128的单通道图像，第1卷积层和第2卷积层分别采用C.ReLU计算函数对输入数据进行卷积计算，计算中所采用的激活函数均为ReLu激活函数，最终通过第5卷积层的卷积计算后，输出了维度值为4*4*512维的图像特征向量，之后经过第1全连接层再次对4*4*512的图像特征进行融合和分类，由此输出维度值为512维的图像特征向量，后续第2全连接层再次对维度值为512维的图像特征向量进行全连接计算，同样输出输出维度值为512维的图像特征向量，最终经过了输出层的Sofmax分类器，并通过Sofmax分类器的365个神经元输出了输入图像相对于365个场景类别的概率值。Figure 1e shows a schematic topology diagram of a training target network model according to an embodiment of the present invention. As shown in Figure 1e, the entire topology generally includes the input layer-the first convolutional layer-the first pooling layer-the first 2 convolutional layers--2nd pooling layer--3rd convolutional layer--4th convolutional layer--5th convolutional layer--1st fully connected layer--2nd fully connected layer--output layer , the input layer input is a 128*128 single-channel image, the first convolutional layer and the second convolutional layer use the C.ReLU calculation function to perform convolution calculations on the input data, and the activation function used in the calculation Both are ReLu activation functions. After the convolution calculation of the fifth convolutional layer, the image feature vector with a dimension value of 4*4*512 is output, and then the 4*4*512 The image features are fused and classified, so that the image feature vector with a dimension value of 512 dimensions is output, and the subsequent second fully connected layer performs full connection calculation on the image feature vector with a dimension value of 512 dimensions, and the output dimension value is also 512 dimensions. The image feature vector of the image finally passes through the Sofmax classifier of the output layer, and outputs the probability value of the input image relative to 365 scene categories through 365 neurons of the Sofmax classifier.

在本实施中，对于训练形成的目标网络模型图而言，其中的第1和第2卷积层具体用来提取输入图像的图像边缘梯度和颜色块等特征，第3至第5卷积层具体用来提取输入图像的图像局部语义特征并逐步提取输入图像的全局语义特征。图1f～图1m分别给出了本发明实施例一所构建目标网络模型中各层输出结果的可视化图。In this implementation, for the target network model diagram formed by training, the first and second convolutional layers are specifically used to extract features such as image edge gradients and color blocks of the input image, and the third to fifth convolutional layers It is specifically used to extract the image local semantic features of the input image and gradually extract the global semantic features of the input image. Figures 1f to 1m respectively show the visualization diagrams of the output results of each layer in the target network model constructed in Embodiment 1 of the present invention.

具体地，图1f中具体给出了预处理后的输入图像，且可以看出该图像中的各图像特征十分清晰；图1g中具体给出了经过第1卷积层卷积计算后的输出结果，可以看出该图像清晰显示了图像轮廓；图1h中具体给出了经过第1池化层池化计算后的输出结果，该图中仍可以隐约判断出图像的轮廓；而后续的图1i至图1m中，随着目标网络模型中卷积层数的加深，神经元的感受野逐渐增大，所提取的图像特征逐渐抽象，人眼已经很难分辨图像具有的特征。但对于图像处理而言，所提取的图像特征越抽象，则其具有的表征能力就越强，基于表征能力强的图像特征更能够准确的进行后续的图像处理。Specifically, Figure 1f shows the preprocessed input image, and it can be seen that the image features in the image are very clear; Figure 1g shows the output after the convolution calculation of the first convolutional layer As a result, it can be seen that the image clearly shows the outline of the image; Figure 1h specifically shows the output results after the pooling calculation of the first pooling layer, and the outline of the image can still be vaguely judged in this figure; while the subsequent figure From 1i to 1m, as the number of convolutional layers in the target network model deepens, the receptive field of neurons gradually increases, and the extracted image features are gradually abstracted, making it difficult for the human eye to distinguish the features of the image. But for image processing, the more abstract the extracted image features are, the stronger the representation ability it has, and the subsequent image processing can be performed more accurately based on the image features with strong representation ability.

本实施例训练形成的目标网络模型，具体如下特点：1)针对图像处理实时性的要求，目标网络模型中第1卷积层和第2卷积层的卷积计算时使用了C.ReLU计算函数，并大量使用了1*1卷积核的卷积层，由此很大程度减少了目标网络模型的计算量，加快了计算速度；2)针对卷积神经网络难训练、难收敛的问题，目标网络模型中在所有的卷积层后使用了BN操作，同时在Convx(x＝3,4,5)中均使用了捷径连接，由此加快目标网络模型训练时的收敛速度。The target network model formed by training in this embodiment has the following specific characteristics: 1) In view of the real-time requirements of image processing, the convolution calculation of the first convolutional layer and the second convolutional layer in the target network model uses C.ReLU calculation function, and a large number of convolutional layers with 1*1 convolution kernels are used, which greatly reduces the calculation amount of the target network model and speeds up the calculation speed; 2) for the difficult training and convergence of convolutional neural networks , the BN operation is used after all convolutional layers in the target network model, and shortcut connections are used in Convx (x=3,4,5), thereby speeding up the convergence speed of the target network model training.

虽然所构建初始网络模型的训练需要较长的时间，但是使用训练好的目标网络模型进行图像特征提取的速度却非常快，经在GPU上测试，使用本实施例目标网络模型进行图像特征提取的时间大概为0.0098s。此外，与现有的PlaceCNN卷积网络模型相比，本实施例的目标网络模型中所具有权值数据的总数量仅有PlaceCNN卷积网络模型的七分之一，由此可认为，本实施例目标网络模型的网络拓扑结构更优于PlaceCNN卷积网络模型，更适用于图像处理中的图像特征提取。Although the training of the constructed initial network model takes a long time, the speed of image feature extraction using the trained target network model is very fast. After testing on the GPU, the target network model of this embodiment is used for image feature extraction. The time is about 0.0098s. In addition, compared with the existing PlaceCNN convolutional network model, the total amount of weight data in the target network model of this embodiment is only one-seventh of the PlaceCNN convolutional network model, so it can be considered that this implementation The network topology of the target network model is better than the PlaceCNN convolutional network model, and it is more suitable for image feature extraction in image processing.

本发明实施例提供的一种网络模型的构建方法，所构建的目标网络模型能够快速、精简的输出低维度的图像特征向量，且基于构建的目标网络模型进行图像特征提取时其提取结果不受光照环境的影响，同时，所提取的图像特征进行图像处理时能够保证处理结果的准确性，从而保证了图像处理的处理效果。The embodiment of the present invention provides a method for constructing a network model. The constructed target network model can output low-dimensional image feature vectors quickly and concisely, and the extraction result is not affected by the image feature extraction based on the constructed target network model. At the same time, the extracted image features can ensure the accuracy of the processing results when the image processing is performed, thus ensuring the processing effect of the image processing.

实施例二Embodiment two

图2为本发明实施例二提供的一种闭环检测方法的流程示意图，该方法适用于在即时定位与地图构建中进行闭环检测的情况，该方法可以由闭环检测装置执行，该装置可以由软件和/或硬件实现，并一般集成在能够进行即时定位与地图构建的计算机设备中。Fig. 2 is a schematic flow chart of a closed-loop detection method provided by Embodiment 2 of the present invention. The method is suitable for performing closed-loop detection in real-time positioning and map construction. The method can be executed by a closed-loop detection device, which can be controlled by software. And/or hardware implementation, and generally integrated in computer equipment capable of real-time positioning and map construction.

如图2所示，本发明实施例二提供的一种闭环检测方法，具体包括如下操作：As shown in Figure 2, a closed-loop detection method provided by Embodiment 2 of the present invention specifically includes the following operations:

S201、将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征。S201. Input the currently captured real-scene image frame into a preset target network model to obtain actual image features of the real-scene image frame.

在本实施例中，具体实现了即时定位与地图构建时的闭环检测。本步骤首先将捕获的实景图像帧输入目标网络模型，可以知道的是，所述目标网络模型基于本发明上述实施例提供的网络模型的构建方法确定。由此可获得所述实景图像帧的实际图像特征。In this embodiment, real-time positioning and closed-loop detection during map construction are specifically realized. In this step, firstly, the captured real-scene image frames are input into the target network model. It can be known that the target network model is determined based on the method for constructing the network model provided by the above-mentioned embodiments of the present invention. Thereby, the actual image features of the real-scene image frame can be obtained.

可以理解的是，所述目标网络模型中的输出层可以根据实际要处理的问题进行不同的设定。示例性地，本实施例进行闭环检测时，并不需要输入图像对应的场景分类结果，而是需要输入图像的图像帧特征向量，因为本实施例可优选地将目标网络模型中第1全连接层或第2全连接层计算后获得的特征向量作为结果输出，由此，对于每张输入图像而言，均可获得512维的图像特征向量。由此可以发现，所述实景图像帧的实际图像特征具有的维度值相对较低，便于实时进行闭环检测。It can be understood that the output layer in the target network model can be set differently according to the actual problem to be dealt with. Exemplarily, when performing loop closure detection in this embodiment, the scene classification result corresponding to the input image is not required, but the image frame feature vector of the input image is required, because this embodiment can preferably use the first full connection in the target network model Layer or the second fully connected layer calculates the feature vector obtained as the result output, thus, for each input image, a 512-dimensional image feature vector can be obtained. From this, it can be found that the actual image features of the real-scene image frame have a relatively low dimensional value, which is convenient for loop closure detection in real time.

同时可以理解的是，本步骤在将实景图像帧作为输入图像输入目标网络模型前，首先对实景图像帧进行灰度化、减均值以及像素分辨率调整等操作，由此形成128*128的单通道的实景图像帧。At the same time, it can be understood that in this step, before the real-scene image frame is input into the target network model as an input image, operations such as grayscale, mean value subtraction, and pixel resolution adjustment are first performed on the real-scene image frame, thereby forming a 128*128 single Channel's real-world image frame.

S202、根据设定的图像帧选取规则，确定所述实景图像帧的至少一个待匹配图像帧，并获取各待匹配图像帧的历史图像特征。S202. Determine at least one image frame to be matched of the real-scene image frame according to the set image frame selection rule, and acquire historical image features of each image frame to be matched.

具体地，为确定所述实景图像帧的闭环检测结果，需要为该实景图像帧选取进行相似性匹配的待匹配图像帧。需要注意的是，闭环检测处理的图像数据往往具有时间连续性，一般情况下，上述实景图像帧与其相邻图像帧的特征相关性较大，计算所得的相似值也往往很高，由此，相邻图像帧很容易被误检测成实景图像帧的闭环区域。Specifically, in order to determine the closed-loop detection result of the real-scene image frame, it is necessary to select an image frame to be matched for similarity matching for the real-scene image frame. It should be noted that the image data processed by closed-loop detection often has time continuity. In general, the feature correlation between the above-mentioned real-scene image frame and its adjacent image frames is relatively large, and the calculated similarity value is often very high. Therefore, Adjacent image frames are easily misdetected as closed-loop regions of real-scene image frames.

本实施例可以设置待匹配图像帧选取的选取规则，以避免实景图像帧与相邻的图像帧进行相似度匹配。具体地，所述待匹配图像帧可以基于图像帧选取规则从已捕获的历史图像帧中选取，同时，还可以获取各待匹配图像帧对应的历史图像特征，可以理解的是，各待匹配图像帧的历史图像特征同样经过目标网络模型提取获得，本实施例可以将捕获的实景图像帧及确定的实际图像特征存储在设定位置中，由此形成包含历史图像帧与对应历史图像特征的历史信息库。In this embodiment, a selection rule for selecting image frames to be matched may be set, so as to avoid similarity matching between real-scene image frames and adjacent image frames. Specifically, the image frames to be matched can be selected from captured historical image frames based on image frame selection rules, and at the same time, historical image features corresponding to each image frame to be matched can also be obtained. It can be understood that each image to be matched The historical image features of the frames are also extracted through the target network model. In this embodiment, the captured real-scene image frames and the determined actual image features can be stored in the set location, thereby forming a historical image frame and corresponding historical image features. database.

在本实施例中，所述图像帧选取规则中可以设定所选取待匹配图像帧与实景图像帧之间的间隔帧数。In this embodiment, the number of interval frames between the selected image frame to be matched and the real-scene image frame may be set in the image frame selection rule.

S203、基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。S203. Based on the similarity value between the actual image feature and each historical image feature, determine a closed-loop detection result of the real-scene image frame.

本实施例可以基于特征向量的相似度计算公式(如两特征向量的余弦值)确定实际图像特征与各历史图像特征的相似度值，同时可以将各相似度值与设定的阈值进行比较，之后根据比较结果确定该实景图像帧的闭环检测结果。This embodiment can determine the similarity value between the actual image feature and each historical image feature based on the similarity calculation formula of the feature vector (such as the cosine value of the two feature vectors), and can compare each similarity value with the set threshold at the same time, Then determine the closed-loop detection result of the real-scene image frame according to the comparison result.

本发明实施例二提供的一种闭环检测方法，能够基于上述实施例构建的目标网络模型对所捕获的图像帧进行图像特征的提取，在保证闭环检测准确率的前提下，有效降低了闭环检测中所需图像特征向量的维度，进而缩短了闭环检测中相似度计算时的计算时间，由此能够较好的满足闭环检测中实时性的要求。A closed-loop detection method provided by Embodiment 2 of the present invention can extract image features from captured image frames based on the target network model constructed in the above-mentioned embodiment, and effectively reduce the closed-loop detection accuracy under the premise of ensuring the accuracy of closed-loop detection. The dimensions of the required image feature vectors in the image, thereby shortening the calculation time of the similarity calculation in the closed-loop detection, thus better meeting the real-time requirements of the closed-loop detection.

实施例三Embodiment three

图3为本发明实施例三提供的一种闭环检测方法的流程示意图，本发明实施例以上述实施例为基础进行优化，在本实施例中，进一步将根据设定的图像帧选取规则，确定所述实景图像帧的至少一个待匹配图像帧，并获取各待匹配图像特征的历史图像特征，具体化为：获取设定的间隔帧数目及实景图像帧的帧号，并将所述帧号与所述间隔帧数目的差值确定为目标帧号；在已构建的历史信息库中将帧号小于或等于所述目标帧号的历史图像帧作为待匹配图像帧；获取各待匹配图像帧基于所述目标网络模型确定的历史图像特征。Fig. 3 is a schematic flow chart of a closed-loop detection method provided by Embodiment 3 of the present invention. The embodiment of the present invention is optimized on the basis of the above-mentioned embodiments. In this embodiment, further according to the set image frame selection rules, determine At least one image frame to be matched of the real-scene image frame, and the historical image features of each image feature to be matched are obtained, which is embodied as: obtaining the set interval frame number and the frame number of the real-scene image frame, and the frame number The difference with the number of frames at intervals is determined as the target frame number; in the built historical information base, the historical image frame whose frame number is less than or equal to the target frame number is used as the image frame to be matched; each image frame to be matched is obtained Historical image features determined based on the target network model.

进一步地，本实施例还将基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果，具体优化为：计算所述实际图像特征与各历史图像特征的相似度值��将大于设定相似阈值的相似度值对应的待匹配图像帧确定为候选闭环图像帧，并将所述候选闭环图像帧添加至设定的候选闭环集；如果所述候选闭环集中仅有一个候选闭环图像帧，则将所述候选闭环图像帧确定为所述实景图像帧的闭环区域；如果所述候选闭环集中存在至少两个候选闭环图像帧，则基于设定的闭环确定策略获得所述实景图像帧的闭环区域。Further, this embodiment will also determine the closed-loop detection result of the real-scene image frame based on the similarity value between the actual image feature and each historical image feature, and the specific optimization is: calculate the actual image feature and each historical image feature The similarity value; the image frame to be matched corresponding to the similarity value greater than the set similarity threshold is determined as a candidate closed-loop image frame, and the candidate closed-loop image frame is added to the set candidate closed-loop set; if the candidate closed-loop If there is only one candidate closed-loop image frame in the set, the candidate closed-loop image frame is determined as the closed-loop area of the real-scene image frame; if there are at least two candidate closed-loop image frames in the candidate closed-loop set, then based on the set closed-loop determination The strategy is to obtain the closed-loop area of the real-scene image frame.

如图3所示，本发明实施例三提供的一种闭环检测方法，具体包括如下操作：As shown in Figure 3, a closed-loop detection method provided by Embodiment 3 of the present invention specifically includes the following operations:

S301、将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征。S301. Input the currently captured real-scene image frame into a preset target network model to obtain actual image features of the real-scene image frame.

需要说明的是，进行即时定位与地图构建的设备可基于其上的摄像头进行图像捕获，一般地，摄像头可以持续的进行图像捕获，其进行图像捕获的速度远大于设备进行即时定位与地图构建时的移动速度，由此造成摄像头连续捕获的多张图像帧实际为同一场景图像。It should be noted that the device for real-time positioning and map building can capture images based on the camera on it. Generally, the camera can continuously capture images, and the speed of image capture is much faster than that of the device for real-time positioning and map building. The moving speed of the camera causes the multiple image frames continuously captured by the camera to actually be images of the same scene.

若本实施例为摄像头捕获的每一张图像帧都进行闭环检测，则在一定程度上增加了设备的处理计算负担。因此，本实施例考虑对摄像头的图像捕获频率进行调整，并优选调整其捕获频率与设备的移动速率相同。If the loop closure detection is performed for each image frame captured by the camera in this embodiment, the processing and calculation burden of the device will be increased to a certain extent. Therefore, this embodiment considers adjusting the image capture frequency of the camera, and preferably adjusts the capture frequency to be the same as the moving rate of the device.

本步骤可认为摄像头基于与设备移动速率相同的捕获频率捕获实景图像帧，然后获得该实景图像帧的实际图像特征。本实施例下述S302～S304具体给出了待匹配图像帧的选取操作。In this step, it can be considered that the camera captures the real-scene image frame based on the same capture frequency as the device's moving rate, and then obtains the actual image features of the real-scene image frame. The following S302-S304 of this embodiment specifically provides the selection operation of the image frame to be matched.

S302、获取设定的间隔帧数目及实景图像帧的帧号，并将所述帧号与所述间隔帧数目的差值确定为目标帧号。S302. Obtain the set interval frame number and the frame number of the real-scene image frame, and determine the difference between the frame number and the interval frame number as the target frame number.

具体地，所述间隔帧数目具体可理解为要选定的待匹配图像帧与实景图像帧之间的间隔��值，所述间隔帧数目��预��设置在图像帧选取规则中，且所述间隔帧数目可基于实际的图像环境进行实际的设定，其设定的优选取值范围可以是[300,800]。Specifically, the number of interval frames can be specifically understood as the minimum interval between the selected image frame to be matched and the real-scene image frame, and the interval frame number can be preset in the image frame selection rule, and the interval The number of frames can be actually set based on the actual image environment, and the preferred value range of the setting can be [300,800].

所述实景图像帧的帧号具体在捕获时形成，可作为区别于其他图像帧的ID标识。一般地，只有在所述实景图像帧的帧号大于所述间隔帧数目时，才对所述实景图像帧进行闭环检测，否则直接忽略所述实景图像帧的闭环检测，直接进行下一帧图像的捕获。The frame number of the real-scene image frame is specifically formed when it is captured, and can be used as an ID identification to distinguish it from other image frames. Generally, only when the frame number of the real-scene image frame is greater than the number of interval frames, the closed-loop detection of the real-scene image frame is performed, otherwise, the closed-loop detection of the real-scene image frame is directly ignored, and the next frame of image is directly performed. capture.

本步骤将实景图像帧的帧号与间隔帧数目的差值确定为目标帧号，所述目标帧号具体可理解为所选取待匹配图像帧可以具有的最大帧号。In this step, the difference between the frame number of the real-scene image frame and the number of interval frames is determined as the target frame number, and the target frame number can be specifically understood as the maximum frame number that the selected image frame to be matched can have.

S303、在已构建的历史信息库中将帧号小于或等于所述目标帧号的历史图像帧作为待匹配图像帧。S303. Using the historical image frame whose frame number is less than or equal to the target frame number in the constructed historical information base as the image frame to be matched.

在本实施例中，可以将已捕获的历史图像帧及其对应的历史图像特征存放于设定的历史信息库中，本实施例可实时将当前捕获的实景图像帧及对应的实际图像特征添加至所述历史信息库中，实现历史信息库的动态更新。In this embodiment, the captured historical image frames and their corresponding historical image features can be stored in the set historical information database. In this embodiment, the currently captured real-scene image frames and corresponding actual image features can be added in real time To the historical information base, realize the dynamic update of the historical information base.

本步骤可以将帧号小于或等于目标帧号的所有历史图像帧作为待匹配图像帧，也可以在符合条件的历史图像帧中选取关键图像帧作为待匹配图像帧。具体地，可以在小于或等于目标帧号的历史图像帧中等差选取关键图像帧，所述等差值可以等于所述间隔帧数目百分之一。In this step, all historical image frames whose frame numbers are less than or equal to the target frame number can be used as image frames to be matched, or key image frames can be selected from qualified historical image frames as image frames to be matched. Specifically, key image frames may be selected from historical image frames less than or equal to the target frame number, and the difference value may be equal to 1% of the number of interval frames.

S304、获取各待匹配图像帧基于所述目标网络模型确定的历史图像特征。S304. Acquire historical image features determined based on the target network model for each image frame to be matched.

S305、计算所述实际图像特征与各历史图像特征的相似度值。S305. Calculate the similarity value between the actual image feature and each historical image feature.

示例性地，基于特征向量相似度计算公式进行相似度值计算。Exemplarily, the similarity value is calculated based on the feature vector similarity calculation formula.

S306、将大于设定相似阈值的相似度值对应的待匹配图像帧确定为候选闭环图像帧，并将所述候选闭环图像帧添加至设定的候选闭环集。S306. Determine the image frame to be matched corresponding to the similarity value greater than the set similarity threshold as a candidate closed-loop image frame, and add the candidate closed-loop image frame to the set candidate closed-loop set.

本步骤可以将计算获得的各相似度值与设定相似阈值进行比较，所述设定相似阈值可优选为0.9。当存在大于设定相似阈值的相似度值时，可以将该相似度值对应的待匹配图像帧作为一个候选闭环图像帧，并添加至候选闭环集。In this step, each similarity value obtained through calculation may be compared with a set similarity threshold, and the set similarity threshold may preferably be 0.9. When there is a similarity value greater than the set similarity threshold, the image frame to be matched corresponding to the similarity value may be used as a candidate closed-loop image frame and added to the candidate closed-loop set.

本实施例可以对所有符合相似度判定条件的相似度对应的待匹配图像帧进行本步骤的操作。本实施例后续可以对候选闭环集中包括的图像帧个数进行统计，并根据统计结果确定执行S307还是执行S308。In this embodiment, the operation of this step can be performed on all the image frames to be matched corresponding to the similarities meeting the similarity determination conditions. In this embodiment, subsequent statistics may be performed on the number of image frames included in the candidate closed-loop set, and it may be determined whether to perform S307 or S308 according to the statistical results.

S307、如果所述候选闭环集中仅有一个候选闭环图像帧，则将所述候选闭环图像帧确定为所述实景图像帧的闭环区域。S307. If there is only one candidate closed-loop image frame in the candidate closed-loop set, determine the candidate closed-loop image frame as a closed-loop area of the real-scene image frame.

具体地，本步骤可以在候选闭环集中仅存在一个候选闭环图像帧时，直接将该候选闭环图像帧确定为该实景图像帧的闭环区域，即可认为该实景图像帧中的场景与该候选闭环图像帧的场景为同一区域。Specifically, in this step, when there is only one candidate closed-loop image frame in the candidate closed-loop set, the candidate closed-loop image frame can be directly determined as the closed-loop area of the real-scene image frame, that is, the scene in the real-scene image frame is considered to be the same as the candidate closed-loop The scene of the image frame is the same area.

S308、如果所述候选闭环集中存在至少两个候选闭环图像帧，则基于设定的闭环确定策略获得所述实景图像帧的闭环区域。S308. If there are at least two candidate closed-loop image frames in the candidate closed-loop set, obtain a closed-loop area of the real-scene image frame based on a set closed-loop determination strategy.

具体地，当候选闭环集中存在多个候选闭环图像帧时，并不能直接将各候选闭环图像帧确定为实景图像帧的闭环区域，需要基于闭环确定策略判定是否满足闭合区域的条件。Specifically, when there are multiple candidate closed-loop image frames in the candidate closed-loop set, each candidate closed-loop image frame cannot be directly determined as the closed-loop area of the real-scene image frame, and it is necessary to determine whether the closed-loop condition is satisfied based on the closed-loop determination strategy.

进一步地，所述基于设定的闭环确定策略获得所述实景图像帧的闭环区域，包括：当所述候选闭环集中候选闭环图像帧的帧号均为离散的条件下，确定所述实景图像帧不存在闭环区域；当所述候选闭环集中存在帧号连续的候选闭环图像帧的条件下，确定帧号连续的起始帧号和末端帧号，并基于所述起始帧号至末端帧号间对应的候选闭环图像帧形成历史图像区域，将所述历史图像区域确定为所述实景图像帧的闭环区域。Further, the closed-loop determination strategy based on the setting to obtain the closed-loop area of the real-scene image frame includes: when the frame numbers of the candidate closed-loop image frames in the candidate closed-loop set are all discrete, determining the real-scene image frame There is no closed-loop area; when there are candidate closed-loop image frames with continuous frame numbers in the candidate closed-loop set, determine the start frame number and the end frame number with continuous frame numbers, and based on the start frame number to the end frame number The corresponding candidate closed-loop image frames form a historical image area, and the historical image area is determined as the closed-loop area of the real-scene image frame.

具体地，首先获取各候选闭环图像帧的帧号，并确定各帧号之间是离散的还是连续的。需要说明的是，若本实施例基于等差值在历史图像帧中选取待匹配图像帧时，则本步骤需要确定相邻候选闭环图像帧的帧号差是否与设定的等差值相等，若相等，也可认为相邻候选闭环图像帧的帧号连续。Specifically, the frame numbers of each candidate closed-loop image frame are acquired first, and it is determined whether the frame numbers are discrete or continuous. It should be noted that if this embodiment selects the image frame to be matched from the historical image frames based on the difference value, then this step needs to determine whether the frame number difference of the adjacent candidate closed-loop image frame is equal to the set difference value, If they are equal, it can also be considered that the frame numbers of adjacent candidate closed-loop image frames are consecutive.

本步骤可以在帧号离散时确定实景图像帧不存在闭环区域；还可以在帧号连续时，基于所有连续帧号对应的候选闭环图像帧合成历史图像区域，并将该历史图像区域确定为该实景图像帧的闭环区域。This step can determine that there is no closed-loop area in the real-scene image frame when the frame number is discrete; it can also be used to synthesize the historical image area based on candidate closed-loop image frames corresponding to all continuous frame numbers when the frame number is continuous, and determine the historical image area as the The closed-loop region of the reality image frame.

可以理解的是，所述候选闭环集中可能存在多个连续帧号段，本实施例可认为多个帧号连续段对应的历史图像区域均为该实景图像帧的闭环区域，因为该多个帧号连续段对应的历史图像区域可能是设备在不同时间段所经过的同一区域。It can be understood that there may be multiple consecutive frame number segments in the candidate closed-loop set. In this embodiment, it can be considered that the historical image areas corresponding to the multiple frame number consecutive segments are all closed-loop areas of the real-scene image frame, because the multiple frames The historical image area corresponding to the continuous segment of the number may be the same area that the device passed through in different time periods.

本发明实施例三提供的一种闭环检测方法，具体描述了实际图像帧的待匹配图像帧的选取过程，同时描述了在候选闭环图像帧中确定实际图像帧的闭环区域的操作过程。利用该方法，首先采用了目标网络模型分别获取了实景图像帧及待匹配图像帧的低维度的图像特征，由此在保证闭环检测相似度计算结果准确性的前提下，更好的降低了闭环检测中相似度的计算时间，由此能够较好的满足闭环检测中实时性的要求。A closed-loop detection method provided by Embodiment 3 of the present invention specifically describes the selection process of the actual image frame to be matched, and also describes the operation process of determining the closed-loop area of the actual image frame among the candidate closed-loop image frames. Using this method, the target network model is firstly used to obtain the low-dimensional image features of the real-scene image frame and the image frame to be matched, thereby better reducing the closed-loop detection while ensuring the accuracy of the similarity calculation results of the closed-loop detection. The calculation time of the similarity degree in the detection can better meet the real-time requirement in the closed-loop detection.

实施例四Embodiment four

图4a为本发明实施例四提供的一种网络模型的构建装置的结构框图，该装置适用于对新的网络模��进行构建和训练的情况，该装置可以由软件和/或硬件实现，并一般集成在计算机设备中。如图4a所示，该装置包括：初始构建模块41和目标确定模块42。Figure 4a is a structural block diagram of a device for constructing a network model provided by Embodiment 4 of the present invention, which is suitable for constructing and training a new network model, which can be implemented by software and/or hardware, and generally integrated in computer equipment. As shown in FIG. 4 a , the device includes: an initial construction module 41 and a target determination module 42 .

其中，初始构建模块41，用于基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括以下至少之一：卷积层的层数、池化层的层数、全连接层的层数、以及各层之间的拓扑连接顺序；所述配置参数信息包括以下至少之一：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗口大小、以及各全连接层的神经元数量。Wherein, the initial construction module 41 is configured to construct and form an initial network model based on the acquired topology information and configuration parameter information, wherein the topology information includes at least one of the following: the number of convolutional layers, the number of pooling layers The number of layers, the number of layers of the fully connected layer, and the topological connection sequence between the layers; the configuration parameter information includes at least one of the following: the convolution step size of each convolution layer and the size and number of convolution kernels, each pool The pooling step size and pooling window size of the layer, and the number of neurons in each fully connected layer.

目标确定模块42，用于根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。The target determination module 42 is configured to iteratively train the initial network model according to the acquired training and learning information, and obtain a target network model with a standard weight data set.

在本实施例中，该装置首先通过初始构建模块41基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型；然后基于目标确定模块42根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。In this embodiment, the device first constructs an initial network model based on the acquired topology information and configuration parameter information through the initial construction module 41; then iteratively trains the initial network model based on the target determination module 42 according to the acquired training and learning information model to obtain a target network model with a standard weight dataset.

本发明实施例四提供的网络模型的构建装置，能够根据特意设定的拓扑结构信息和配置参数信息，形成特定的初始网络模型，并能够通过训练获得目标网络模型，同时能够保证输出的低维度的图像特征，且基于该装置构建的目标网络模型进行图像特征提取时其提取结果不受光照环境的影响，同时，所提取的图像特征进行图像处理时能够保证处理结果的准确性，从而保证了图像处理的处理效果。The network model construction device provided in Embodiment 4 of the present invention can form a specific initial network model according to the specially set topology information and configuration parameter information, and can obtain the target network model through training, while ensuring low-dimensional output image features, and the image feature extraction results based on the target network model constructed by the device are not affected by the lighting environment. The processing effect of image processing.

进一步地，所述第1卷积层的卷积步长为1、卷积核大小为5*5及卷积核数量为32；所述第2卷积层的卷积步长为1、卷积核大小为3*3及卷积核数量为64；所述第1池化层和第2池化层的池化步长均为2及池化窗口大小均为3*3；所述第1全连接层和第2全连接层的神经元数量均为512。Further, the convolution step of the first convolution layer is 1, the convolution kernel size is 5*5, and the number of convolution kernels is 32; the convolution step of the second convolution layer is 1, and the convolution kernel size is 1. The size of the product kernel is 3*3 and the number of convolution kernels is 64; the pooling steps of the first pooling layer and the second pooling layer are both 2 and the pooling window size is 3*3; The number of neurons in the fully connected layer 1 and the fully connected layer 2 is 512.

在上述优化的基础上，所述第3_1_1子卷积层和第3_2_1子卷积层的卷积步长均可优选为1、卷积核大小均可优选为1*1及卷积核数量均可优选为96；所述第3_1_2子卷积层和所述第3_2_2子卷积层的卷积步长均可优选为1、卷积核大小均可优选为3*3及卷积核数量均可优选为96；所述第3_1_3子卷积层和所述第3_2_3子卷积层的卷积步长分别可优选为2和1、卷积核大小均可优选为1*1及卷积核数量均可优选为192。On the basis of the above-mentioned optimization, the convolution step size of the 3_1_1 sub-convolution layer and the 3_2_1 sub-convolution layer can be preferably 1, the convolution kernel size can be preferably 1*1 and the number of convolution kernels is equal to It can be preferably 96; the convolution step size of the 3_1_2 sub-convolution layer and the 3_2_2 sub-convolution layer can be preferably 1, the convolution kernel size can be preferably 3*3 and the number of convolution kernels It can be preferably 96; the convolution step size of the 3_1_3 sub-convolution layer and the 3_2_3 sub-convolution layer can be preferably 2 and 1 respectively, and the convolution kernel size can be preferably 1*1 and convolution kernel The number can be preferably 192.

进一步地，所述第4_1_1子卷积层和第4_2_1子卷积层的卷积步长均可优选为1、卷积核大小均可优选为1*1及卷积核数量均可优选为128；所述第4_1_2子卷积层和所述第4_2_2子卷积层的卷积步长均可优选为1、卷积核大小均可优选为3*3及卷积核数量均可优选为128；所述第4_1_3子卷积层和所述第4_2_3子卷积层的卷积步长分别可优选为2和1、卷积核大小均可优选为1*1及卷积核数量均可优选为384。Further, the convolution step size of the 4_1_1 sub-convolution layer and the 4_2_1 sub-convolution layer can be preferably 1, the convolution kernel size can be preferably 1*1 and the number of convolution kernels can be preferably 128 The convolution step size of the 4_1_2 sub-convolution layer and the 4_2_2 sub-convolution layer can be preferably 1, the convolution kernel size can be preferably 3*3 and the convolution kernel quantity can be preferably 128 The convolution step size of the 4_1_3 sub-convolution layer and the 4_2_3 sub-convolution layer can be preferably 2 and 1 respectively, the convolution kernel size can be preferably 1*1 and the convolution kernel quantity can be preferably for 384.

进一步地，所述第5_1_1子卷积层和第5_2_1子卷积层的卷积步长均可优选为1、卷积核大小均可优选为1*1及卷积核数量均可优选为256；所述第5_1_2子卷积层和所述第5_2_2子卷积层的卷积步长均可优选为1、卷积核大小均可优选为3*3及卷积核数量均可优选为256；所述第5_1_3子卷积层和所述第5_2_3子卷积层的卷积步长分别可优选为2和1、卷积核大小均可优选为1*1及卷积核数量均可优选为512。Further, the convolution step size of the 5_1_1st sub-convolution layer and the 5_2_1 sub-convolution layer can be preferably 1, the size of the convolution kernel can be preferably 1*1 and the number of convolution kernels can be preferably 256 The convolution step size of the 5_1_2 sub-convolution layer and the 5_2_2 sub-convolution layer can be preferably 1, the convolution kernel size can be preferably 3*3 and the convolution kernel quantity can be preferably 256 The convolution step size of the 5_1_3 sub-convolution layer and the 5_2_3 sub-convolution layer can be preferably 2 and 1 respectively, the convolution kernel size can be preferably 1*1 and the convolution kernel quantity can be preferably for 512.

在上述优化的基础上，所述训练学习信息包括以下至少之一：输入图像样本集、激活函数、偏置数据、各卷积层中卷积核的初始权值数据和卷积函数、各池化层中池化窗口的初始权值数据和池化函数、各全连接层中神经元的初始权值数据以及输出分类函数；所述标准权值数据集包括以下至少之一：迭代训练后各卷积层中卷积核、各池化层中池化窗口以及各全连接层中神经元对应的标准权值数据。On the basis of the above optimization, the training and learning information includes at least one of the following: input image sample set, activation function, bias data, initial weight data and convolution function of the convolution kernel in each convolution layer, each pool The initial weight data and the pooling function of the pooling window in the pooling layer, the initial weight data and the output classification function of the neurons in each fully connected layer; the standard weight data set includes at least one of the following: after iterative training, each The standard weight data corresponding to the convolution kernel in the convolutional layer, the pooling window in each pooling layer, and the neurons in each fully connected layer.

同时，本发明实施例还提供了一种计算机设备，图4b为本发明实施例四提供的一种计算机设备的硬件结构示意图，如图4b所示，本发明实施例四提供的计算机设备，包括：处理器401和存储装置402、该计算机设备中的处理器可以是一个或多个，图4b中以一个处理器401为例，所述计算机设备中的处理器和存储装置可通过总线或其他方式连接，图4b中以通过总线连接为例。At the same time, the embodiment of the present invention also provides a computer device. FIG. 4b is a schematic diagram of the hardware structure of a computer device provided in Embodiment 4 of the present invention. As shown in FIG. 4b, the computer device provided in Embodiment 4 of the present invention includes : Processor 401 and storage device 402, the processor in this computer equipment can be one or more, take a processor 401 as an example in Fig. 4 b, the processor and storage device in the described computer equipment can pass bus or other connection mode, in Figure 4b, the bus connection is taken as an example.

该计算机设备中的存储装置402作为一种计算机可读存储介质，可用于存储一个或多个程序，所述程序可以是软件程序、计算机可执行程序以及模块，如本发明实施例提供的网络模型的构建装置中对应的程序指令/模块(例如，附图4a所示的模块，包括：初始构建模块41和目标确定模块42)。处理器401通过运行存储在存储装置402中的软件程序、指令以及模块，从而执行计算机设备的各种功能应用以及数据处理，即实现上述方法实施例中网络模型的构建方法。The storage device 402 in the computer equipment, as a computer-readable storage medium, can be used to store one or more programs, and the programs can be software programs, computer executable programs and modules, such as the network model provided by the embodiment of the present invention The corresponding program instructions/modules in the construction device (for example, the modules shown in FIG. 4a, including: the initial construction module 41 and the target determination module 42). The processor 401 executes various functional applications and data processing of the computer equipment by running the software programs, instructions and modules stored in the storage device 402, that is, implements the method for constructing the network model in the above method embodiments.

存储装置402可包括存储程序区和存储数据区，其中，存储程序区��存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据设备的使用所创建的数据等。此外，存储装置402可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中，存储装置402可进一步包括相对于处理器401远程设置的存储器，这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The storage device 402 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the device, and the like. In addition, the storage device 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some examples, the storage device 402 may further include memories that are remotely located relative to the processor 401, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

并且，当上述计算机设备所包括一个或者多个程序被所述一个或者多个处理器401执行时，其中一个程序可以进行如下操作：Moreover, when one or more programs included in the above-mentioned computer device are executed by the one or more processors 401, one of the programs may perform the following operations:

基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括以下至少之一：卷积层的层数、池化层的层数、全连接层的层数、以及各层之间的拓扑连接顺序；所述配置参数信息包括以下至少之一：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗口大小、以及各全连接层的神经元数量；根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。Based on the obtained topology information and configuration parameter information, construct an initial network model, wherein the topology information includes at least one of the following: the number of layers of the convolutional layer, the number of layers of the pooling layer, and the number of layers of the fully connected layer , and the topological connection order between each layer; the configuration parameter information includes at least one of the following: the convolution step size and the convolution kernel size and quantity of each convolution layer, the pooling step size and the pooling step size of each pooling layer Optimize the window size and the number of neurons in each fully connected layer; according to the obtained training and learning information, iteratively train the initial network model to obtain a target network model with a standard weight data set.

此外，本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明实施例一提供的网络模型的构建方法，其中，上述实施例一提供的方法包括：基于获取的拓扑结构信息及配置参数信息，构建形成初始网络模型，其中，所述拓扑结构信息包括：卷积层的层数、池化层的层数、全连接层的层数以及各层之间的拓扑连接顺序；所述配置参数信息包括：各卷积层的卷积步长及卷积核大小和数量、各池化层的池化步长及池化窗口大小、以及各全连接层的神经元数量；根据获取的训练学习信息，迭代训练所述初始网络模型，获得具有标准权值数据集的目标网络模型。In addition, an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for constructing a network model provided in Embodiment 1 of the present invention is implemented, wherein the above-mentioned Embodiment 1 The provided method includes: constructing and forming an initial network model based on the acquired topology information and configuration parameter information, wherein the topology information includes: the number of convolutional layers, the number of pooling layers, and the layers of a fully connected layer number and the topological connection sequence between each layer; the configuration parameter information includes: the convolution step size and the convolution kernel size and quantity of each convolution layer, the pooling step size and the pooling window size of each pooling layer, and the number of neurons in each fully connected layer; according to the obtained training and learning information, iteratively train the initial network model to obtain a target network model with a standard weight data set.

实施例五Embodiment five

图5a为本发明实施例五提供的一种闭环检测装置的结构框图，该装置适用于在即时定位与地图构建中进行闭环检测的情况，该装置可以由可以由软件和/或硬件实现，并一般集成在能够进行即时定位与地图构建的计算机设备中。如图5所示，该装置包括：特征提取模块51、图像选取模块52和检测确定模块53。Figure 5a is a structural block diagram of a closed-loop detection device provided by Embodiment 5 of the present invention, which is suitable for performing closed-loop detection in real-time positioning and map construction, and the device can be realized by software and/or hardware, and It is generally integrated in computer equipment capable of real-time positioning and map construction. As shown in FIG. 5 , the device includes: a feature extraction module 51 , an image selection module 52 and a detection determination module 53 .

其中，特征提取模块51，用于将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，所述目标网络模型基于本发明上述实施例四提供的网络模型的构建装置确定；Among them, the feature extraction module 51 is used to input the currently captured real-scene image frame into the preset target network model to obtain the actual image features of the real-scene image frame. The target network model is based on the network provided by the fourth embodiment of the present invention. The construction device of the model is determined;

图像选取模块52，用于根据设定的图像帧选取规则，确定至少一个待匹配图像帧及对应的历史图像特征；The image selection module 52 is used to determine at least one image frame to be matched and corresponding historical image features according to the set image frame selection rules;

检测确定模块53，用于基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。The detection and determination module 53 is configured to determine the closed-loop detection result of the real-scene image frame based on the similarity value between the actual image feature and each historical image feature.

在本实施例中，该装置首先通过特征提取模块51将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，然后通过图像选取模块52根据设定的图像帧选取规则，确定至少一个待匹配图像帧及对应的历史图像特征；最终通过检测确定模块53基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。In this embodiment, the device first inputs the currently captured real-scene image frame into the preset target network model through the feature extraction module 51 to obtain the actual image features of the real-scene image frame, and then uses the image selection module 52 according to the set The image frame selection rule determines at least one image frame to be matched and the corresponding historical image feature; finally, the closed-loop detection of the real-scene image frame is determined based on the similarity value between the actual image feature and each historical image feature through the detection determination module 53 result.

本发明实施例五提供的一种闭环检测装置，能够基于上述实施例构建的目标网络模型对所捕获的图像帧进行图像特征的提取，在保证闭环检测准确率的前提下，有效降低了闭环检测中所需图像特征向量的维度，进而缩短了闭环检测中相似度计算时的计算时间，由此能够较好的满足闭环检测中实时性的要求。Embodiment 5 of the present invention provides a closed-loop detection device, which can extract image features of captured image frames based on the target network model constructed in the above-mentioned embodiments, and effectively reduce the closed-loop detection accuracy under the premise of ensuring the accuracy of closed-loop detection. The dimensions of the required image feature vectors in the image, thereby shortening the calculation time of the similarity calculation in the closed-loop detection, thus better meeting the real-time requirements of the closed-loop detection.

进一步地，图像选取模块52，具体用于：Further, the image selection module 52 is specifically used for:

获取设定的间隔帧数目及实景图像帧的帧号，并将所述帧号与所述间隔帧数目的差值确定为目标帧号；在已构建的历史信息库中将帧号小于或等于所述目标帧号的历史图像帧作为待匹配图像帧；获取各待匹配图像帧基于所述目标网络模型确定的历史图像特征。Obtain the set interval frame number and the frame number of the real-scene image frame, and determine the difference between the frame number and the interval frame number as the target frame number; set the frame number less than or equal to The historical image frame of the target frame number is used as the image frame to be matched; and the historical image features determined based on the target network model of each image frame to be matched are obtained.

进一步地，检测确定模块53，包括：Further, the detection determination module 53 includes:

相似计算单元，用于计算所述实际图像特征与各历史图像特征的相似度值；A similar calculation unit, used to calculate the similarity value between the actual image feature and each historical image feature;

候选确定单元，用于将大于设定相似阈值的相似度值对应的待匹配图像帧确定为候选闭环图像帧，并将所述候选闭环图像帧添加至设定的候选闭环集；A candidate determination unit, configured to determine the image frame to be matched corresponding to the similarity value greater than the set similarity threshold as a candidate closed-loop image frame, and add the candidate closed-loop image frame to the set candidate closed-loop set;

第一确定单元，用于当所述候选闭环集中仅有一个候选闭环图像帧时，将所述候选闭环图像帧确定为所述实景图像帧的闭环区域；A first determining unit, configured to determine the candidate closed-loop image frame as the closed-loop area of the real-scene image frame when there is only one candidate closed-loop image frame in the candidate closed-loop set;

第二确定单元，用于当所述候选闭环集中存在至少两个候选闭环图像帧时，基于设定的闭环确定策略获得所述实景图像帧的闭环区域。The second determining unit is configured to obtain the closed-loop area of the real-scene image frame based on a set closed-loop determination strategy when there are at least two candidate closed-loop image frames in the candidate closed-loop set.

在上述优化的基础上，所述第二确定单元，具体用于：On the basis of the above optimization, the second determining unit is specifically used for:

当所述候选闭环集中候选闭环图像帧的帧号均为离散的条件下，确定所述实景图像帧不存在闭环区域；当所述候选闭环集中存在帧号连续的候选闭环图像帧的条件下，确定帧号连续的起始帧号和末端帧号，并基于所述起始帧号至末端帧号间对应的候选闭环图像帧形成历史图像区域，将所述历史图像区域确定为所述实景图像帧的闭环区域。When the frame numbers of the candidate closed-loop image frames in the candidate closed-loop set are all discrete, it is determined that there is no closed-loop area in the real-scene image frame; when there are candidate closed-loop image frames with continuous frame numbers in the candidate closed-loop set, Determine the start frame number and the end frame number with consecutive frame numbers, and form a historical image area based on the corresponding candidate closed-loop image frames between the start frame number and the end frame number, and determine the historical image area as the real-scene image The closed-loop region of the frame.

同时，本发明实施例五还提供了一种计算机设备，图5b为本发明实施例五提供一种计算机设备的硬件结构示意图，如图5b所示，本发明实施例五提供的计算机设备，包括：摄像头501，用于捕获图像帧，还包括：处理器502和存储装置503、该计算机设备中的处理器可以是一个或多个，图5b中以一个处理器502为例，所述计算��设备中的摄像头可以通过总线或其他方式分别与处理器和存储装置连接，且处理器和存储装置也通过总线或其他方式连接，图5b中以通过总线连接为例。可以理解的是，所述计算机设备中的处理器502可以控制摄像头501的操作。At the same time, Embodiment 5 of the present invention also provides a computer device. FIG. 5b is a schematic diagram of the hardware structure of a computer device provided by Embodiment 5 of the present invention. As shown in FIG. 5b, the computer device provided by Embodiment 5 of the present invention includes : Camera 501, for capturing image frames, also includes: processor 502 and storage device 503, the processor in this computer equipment can be one or more, take a processor 502 as example in Fig. 5b, described computer equipment The camera in the camera can be respectively connected to the processor and the storage device through a bus or other methods, and the processor and the storage device are also connected through a bus or other methods. In FIG. 5b, connection through a bus is taken as an example. It can be understood that the processor 502 in the computer device can control the operation of the camera 501 .

该计算机设备中的存储装置503作为一种计算机可读存储介质，可用于存储一个或多个程序，所述程序可以是软件程序、计算机可执行程序以及模块，如本发明实施例提供的闭环检测装置中对应的程序指令/模块(例如，附图5a所示的模块，包括：特征提取模块51、图像选取模块52和检测确定模块53)。处理器502通过运行存储在存储装置503中的软件程序、指令以及模块，从而执行计算机设备的各种功能应用以及数据处理，即实现上述方法实施例中闭环检测方法。The storage device 503 in the computer equipment, as a computer-readable storage medium, can be used to store one or more programs, and the programs can be software programs, computer-executable programs and modules, such as the closed-loop detection provided by the embodiment of the present invention The corresponding program instructions/modules in the device (for example, the modules shown in Fig. 5a, including: feature extraction module 51, image selection module 52 and detection determination module 53). The processor 502 executes various functional applications and data processing of the computer equipment by running the software programs, instructions and modules stored in the storage device 503 , that is, realizes the closed-loop detection method in the above method embodiment.

存储装置503可包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据设备的使用所创建的数据等。此外，存储装置503可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中，存储装置503可进一步包括相对于处理器502远程设置的存储器，这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The storage device 503 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the device, and the like. In addition, the storage device 503 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some examples, the storage device 503 may further include memories that are remotely located relative to the processor 502, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

并且，当上述计算机设备所包括一个或者多个程序被所述一个或者多个处理器502执行时，其中一个程序可以进行如下操作：Moreover, when one or more programs included in the above-mentioned computer device are executed by the one or more processors 502, one of the programs may perform the following operations:

将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，所述目标网络模型基于本发明实施例一提供的网络模型的构建方法确定；根据设定的图像帧选取规则，确定至少一个待匹配图像帧及对应的历史图像特征；基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。Input the currently captured real-scene image frame into the preset target network model to obtain the actual image features of the real-scene image frame, and the target network model is determined based on the network model construction method provided in Embodiment 1 of the present invention; according to the set The image frame selection rule is to determine at least one image frame to be matched and the corresponding historical image features; based on the similarity value between the actual image features and each historical image feature, determine the closed-loop detection result of the real-scene image frame.

此外，本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明实施例三提供的闭环检测方法，其中，上述实施例三提供的方法包括：将当前捕获的实景图像帧输入预设的目标网络模型，获得所述实景图像帧的实际图像特征，所述目标网络模型基于本发明实施例一提供的网络模型的构建方法确定；根据设定的图像帧选取规则，确定至少一个待匹配图像帧及对应的历史图像特征；基于所述实际图像特征与各历史图像特征的相似度值，确定所述实景图像帧的闭环检测结果。In addition, an embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the closed-loop detection method provided in the third embodiment of the present invention is implemented, wherein the method provided in the third embodiment above The method includes: inputting the currently captured real-scene image frame into a preset target network model to obtain the actual image features of the real-scene image frame, and the target network model is determined based on the network model construction method provided in Embodiment 1 of the present invention; according to The set image frame selection rule determines at least one image frame to be matched and the corresponding historical image features; based on the similarity value between the actual image features and each historical image feature, the closed-loop detection result of the real-scene image frame is determined.

通过以上关于实施方式的描述，所属领域的技术人员可以清楚地了解到，本发明可借助软件及必需的通用硬件来实现，当然也可以通过硬件实现，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(RandomAccess Memory,RAM)、闪存(FLASH)、硬盘或光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述的方法。Through the above description about the implementation mode, those skilled in the art can clearly understand that the present invention can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation mode . Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in various embodiments of the present invention.

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims

A kind of 1. construction method of network model, it is characterised in that including：

Topology information and configuration parameter information based on acquisition, structure form initial network model, wherein, the topology knot Structure information includes at least one of：Between the number of plies of convolutional layer, the number of plies of pond layer, the number of plies of full articulamentum and each layer Topology connection order；The configuration parameter information includes at least one of：The convolution step-length and convolution kernel of each convolutional layer are big The neuronal quantity of small and quantity, the pond step-length of each pond layer and pond window size and each full articulamentum；

According to the training learning information of acquisition, initial network model described in repetitive exercise, obtain with standard weight data set Objective network model.
2. according to the method for claim 1, it is characterised in that the convolutional layer includes 5 layers of convolutional layer, respectively volume 1 Lamination, the 2nd convolutional layer, the 3rd convolutional layer, the 4th convolutional layer and the 5th convolutional layer；

The pond layer includes 2 layers, respectively the 1st pond layer and the 2nd pond layer；

The full articulamentum includes 2 layers, respectively the 1st full articulamentum and the 2nd full articulamentum；

The Topology connection order is expressed as：Input layer -- the 1st convolutional layer -- the 1st pond layer -- the 2nd convolutional layer -- the 2nd pond Layer -- the 3rd convolutional layer -- the 4th convolutional layer -- the 5th convolutional layer -- the 1st full articulamentum -- the 2nd full articulamentum -- output layer.
3. according to the method for claim 2, it is characterised in that i-th convolutional layer includes the i-th _ 1 convolutional layer and the i-th _ 2 Convolutional layer, wherein, i value is 3,4 and 5, and the convolutional calculation of the i-th _ 1 convolutional layer and the i-th _ 2 convolutional layer is using victory Footpath connects；

I-th _ j convolutional layers also include：I-th _ j_1 convolutional layers, i-th _ j_2 convolutional layers and i-th _ j_3 convolutional layers, Wherein, j value is 1 and 2.
4. according to the method described in claim any one of 1-3, it is characterised in that it is described training learning information include it is following at least One of：The initial weight data and convolution letter of convolution kernel in input picture sample set, activation primitive, biased data, each convolutional layer The initial weight number of neuron in the initial weight data and pond function of several, each Chi Huacengzhongchiization window, each full articulamentum According to this and output category function；

The standard weight data set includes at least one of：After repetitive exercise in each convolutional layer in convolution kernel, each pond layer Standard weight data corresponding to neuron in pond window and each full articulamentum.
A kind of 5. closed loop detection method, it is characterised in that including：

The real scene image frame currently captured is inputted into default objective network model, obtains the real image of the real scene image frame Feature, the objective network model are determined based on the construction method described in claim any one of 1-4；

According to the picture frame selection rule of setting, at least one picture frame to be matched of the real scene image frame is determined, and is obtained The historical image characteristic of each picture frame to be matched；

Similarity value based on the real image feature Yu each historical image characteristic, determine the closed loop inspection of the real scene image frame Survey result.
6. according to the method for claim 5, it is characterised in that the picture frame selection rule according to setting, determine institute At least one picture frame to be matched of real scene image frame is stated, and obtains the historical image characteristic of each characteristics of image to be matched, including：

Obtain setting interval frame number and real scene image frame frame number, and by the frame number with it is described be spaced frame number difference It is defined as target frame number；

Frame number is less than or equal to the history image frame of the target frame number as to be matched in the history information library built Picture frame；

Obtain the historical image characteristic that each picture frame to be matched is determined based on the objective network model.
7. according to the method for claim 5, it is characterised in that described to be based on the real image feature and each history image The Similarity value of feature, the closed loop testing result of the real scene image frame is determined, including：

Calculate the Similarity value of the real image feature and each historical image characteristic；

Picture frame to be matched corresponding to will be greater than setting the Similarity value of similar threshold value is defined as candidate's closed image frame, and by institute Candidate's closed image frame is stated added to candidate's closed loop collection of setting；

If candidate's closed loop, which is concentrated, only has candidate's closed image frame, candidate's closed image frame is defined as institute State the closed loop region of real scene image frame；

If candidate's closed loop, which is concentrated, has at least two candidate's closed image frames, the closed loop based on setting determines that strategy obtains Obtain the closed loop region of the real scene image frame.
8. according to the method for claim 7, it is characterised in that the closed loop based on setting determines that strategy obtains the reality The closed loop region of scape picture frame, including：

Under conditions of candidate's closed loop concentrates the frame number of candidate's closed image frame to be discrete, the real scene image frame is determined In the absence of closed loop region；

Under conditions of candidate's closed loop is concentrated and the continuous candidate's closed image frame of frame number be present, determine that frame number continuously originates Frame number and end frame number, and based on the initial frame number to corresponding candidate's closed image frame history of forming image between the frame number of end Region, the history image region is defined as to the closed loop region of the real scene image frame.
A kind of 9. construction device of network model, it is characterised in that including：

Initial construction module, for the topology information based on acquisition and configuration parameter information, structure forms initial network mould Type, wherein, the topology information includes at least one of：The number of plies of convolutional layer, the number of plies of pond layer, full articulamentum Topology connection order between the number of plies and each layer；The configuration parameter information includes at least one of：The volume of each convolutional layer The god of product step-length and convolution kernel size and number, the pond step-length of each pond layer and pond window size and each full articulamentum Through first quantity；

Target determination module, for the training learning information according to acquisition, initial network model, is had described in repetitive exercise The objective network model of standard weight data set.
A kind of 10. loop detector, it is characterised in that including：

Characteristic extracting module, the real scene image frame for will currently capture input default objective network model, obtain the reality The real image feature of scape picture frame, construction device of the objective network model described in based on claim any one of 13-14 It is determined that；

Image chooses module, for the picture frame selection rule according to setting, determines that at least one of real scene image frame treats Picture frame is matched, and obtains the historical image characteristic of each picture frame to be matched；

Determining module is detected, for the Similarity value based on the real image feature Yu each historical image characteristic, it is determined that described The closed loop testing result of real scene image frame.
A kind of 11. computer equipment, it is characterised in that including：

One or more processors；

Storage device, for storing one or more programs；

One or more of programs are by one or more of computing devices so that one or more of processors are realized The construction method of network model as any one of claim 1-4.
12. a kind of computer equipment, including：Camera, for capture images frame, it is characterised in that also include：

One or more processors；

Storage device, for storing one or more programs；

One or more of programs are by one or more of computing devices so that one or more of processors are realized Closed loop detection method as any one of claim 5-8.
13. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The construction method of the network model as any one of claim 1-4 is realized during execution.
14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The closed loop detection method as any one of claim 5-8 is realized during execution.