CN116385981B - A vehicle re-identification method and device guided by camera topology map - Google Patents

A vehicle re-identification method and device guided by camera topology map

Info

Publication number
CN116385981B
CN116385981B CN202310260112.XA CN202310260112A CN116385981B CN 116385981 B CN116385981 B CN 116385981B CN 202310260112 A CN202310260112 A CN 202310260112A CN 116385981 B CN116385981 B CN 116385981B
Authority
CN
China
Prior art keywords
camera
vehicle
representing
representation
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310260112.XA
Other languages
Chinese (zh)
Other versions
CN116385981A (en
Inventor
李洪潮
孟庆洛
孙丽萍
罗永龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Original Assignee
Anhui Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University filed Critical Anhui Normal University
Priority to CN202310260112.XA priority Critical patent/CN116385981B/en
Publication of CN116385981A publication Critical patent/CN116385981A/en
Application granted granted Critical
Publication of CN116385981B publication Critical patent/CN116385981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

本发明公开了一种相机拓扑图引���的车辆重识别方法及装置,所述方法包括:构建训练集,获取车辆特征表示;基于车辆特征表示构建相机拓扑图;基于相机拓扑图构建任意两个车辆的特征表示之间的拓扑关系并输入图卷积网络,获取最终的聚合特征;将最终的聚合特征与车辆特征表示进行融合,融合的结果输入到全连接层进行类预测;构建目标损失函数,对图卷积网络进行训练,直到目标损失函数值最小时停止训练,得到训练好的图卷积网络;利用训练好的图卷积网络进行车辆重识别;本发明的优点在于:提高重识别的准确性。

The present invention discloses a vehicle re-identification method and device guided by a camera topology map. The method comprises: constructing a training set to obtain vehicle feature representations; constructing a camera topology map based on the vehicle feature representations; constructing a topological relationship between the feature representations of any two vehicles based on the camera topology map and inputting the relationship into a graph convolutional network to obtain final aggregated features; fusing the final aggregated features with the vehicle feature representations, and inputting the fusion result into a fully connected layer for class prediction; constructing a target loss function, training the graph convolutional network, and stopping the training until the target loss function value is minimized to obtain a trained graph convolutional network; and performing vehicle re-identification using the trained graph convolutional network. The present invention has the advantages of improving the accuracy of re-identification.

Description

Vehicle re-identification method and device guided by camera topological graph
Technical Field
The invention relates to the field of computer vision, in particular to a vehicle re-identification method and device guided by a camera topological graph.
Background
Vehicle Re-identification (Re-ID) is intended to retrieve a vehicle image of interest from gallery images captured by non-overlapping monitoring cameras. This is a positive and challenging task that has been of great interest due to its wide application in the areas of social security, smart city and intelligent transportation. Despite significant success, it still faces serious challenges such as in-camera occlusion, cross-camera illumination, and perspective changes, which limit its application in realistic complex scenes.
The prior art presents different approaches to address the three challenges described above. Representative methods are mainly classified into three types, 1) a method based on viewpoint learning, learning two metrics of similar and different perspectives in two feature spaces, and providing a viewpoint-aware network (VANet) for vehicle re-recognition, 2) a method based on component learning, providing a dual-path adaptive attention model to capture key points related to vehicle re-recognition (AAVER) components, and 3) a method based on path learning, constructing space-time constraints and optimizing a matching result of vehicle re-recognition, and using time-space information as a physical constraint to reduce complexity of a matching algorithm. However, these efforts have focused mainly on mining information inside a single image, and thus lack interaction between different images.
In recent years, graph roll-up networks (GCNs) have begun to prevail. The graph convolutional network generalizes the capability of Convolutional Neural Networks (CNNs) by performing convolutional operations on graph structure data. The traditional graph-convolution network model is widely applied to computer vision tasks, such AS 1) gesture estimation, capturing gesture information such AS local and global node relations through a semantic graph-convolution network (SemGCN), 2) action recognition, providing an action structure graph convolution network (AS-GCN) to extract useful space and time information for action recognition, 3) pedestrian re-recognition, providing a similarity-guided graph neural network, incorporating rich graph library similarity information into a training process, 4) vehicle re-recognition, providing a parsing-guided cross-component inference network (PCRNet) to learn discrimination feature representations, modeling correlation among components and the like. Vehicle re-identification based on graph roll-up networks is becoming an industry research hotspot.
Chinese patent publication No. CN112396027A discloses a vehicle re-identification method based on a graph convolution neural network, which comprises the steps of constructing a network model for vehicle re-identification, extracting global and local features of a vehicle image to be re-identified by using the convolution neural network, obtaining structural features by using the graph convolution neural network, calculating a loss function of the network model by using the structural features, training the network model according to the loss function, and mining structural information between the local features and between the local features and the global features by using the graph convolution neural network, so that better and more comprehensive feature expression is obtained, and the accuracy of vehicle re-identification is improved. However, in the case of a complex camera system scene, that is, a plurality of different cameras collecting images, the difference of the images collected by the different cameras and the connection between the adjacent cameras are not considered, so that the collected feature vectors cannot accurately express the vehicle information, and the accuracy of vehicle re-identification is not high.
Disclosure of Invention
The invention aims to solve the technical problem of improving the accuracy of vehicle re-identification in the scene of collecting images by a plurality of different cameras.
The invention solves the technical problems by the following technical means that a vehicle re-identification method guided by a camera topological graph comprises the following steps:
firstly, constructing a training set and acquiring vehicle characteristic representation;
constructing a camera topological graph based on the vehicle characteristic representation;
Step three, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining a final aggregation feature;
Fusing the final aggregation characteristics and the vehicle characteristic representation, and inputting the fused result into a full-connection layer for class prediction;
Constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum to obtain a trained graph rolling network;
And step six, carrying out vehicle re-identification by using the trained graph convolutional network.
The method has the advantages that firstly, a training set is built to obtain the vehicle characteristic representation, in addition, a camera topological graph is built to input the topological relation into a graph rolling network to obtain the aggregation characteristic, then the two characteristics are fused to obtain the fused characteristic, the class prediction result is finally obtained according to the fusion characteristic, the whole characteristic recognition process considers the original visual characteristic, namely the vehicle characteristic representation, and the aggregation characteristic is obtained based on the camera topological graph, so that under the condition that a plurality of different cameras acquire images, the difference of the images acquired by the different cameras and the connection between the adjacent cameras can be represented, the acquired characteristic vector can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.
Further, the first step includes:
Building training sets Where x i represents the ith image, N T represents the total number of pictures of the training set, y i represents its identity tag,Representing its camera tag;
the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is the feature representation of the { h 1,h2,...,hN},hN } vehicle.
Further, the second step includes:
According to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations among the cameras, so that a camera topological graph G= (V, E) is constructed, wherein V represents the camera nodes, Representing the C T th camera node, E is an edge set in a camera topological graph, E= { E system,Eposition,Eorientation,Eindividual},Esystem,Eposition,Eorientation,Eindividual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G system,Gposition,Gorientation,Gindividual.
Further, the third step includes:
The topological relation a ij of the feature representations h i and h j of any two vehicles is expressed as:
Wherein, the Representing the edge between the ith camera tab and the jth camera tab in the camera topology graph G.
Further, the working process of the graph rolling network in the third step is as follows:
By the formula Calculating a mask matrix, wherein topk represents topk algorithm, sim i represents feature similarity between the ith image and the jth image, representing all samples, (Sim i, representing comparing the ith sample with all samples);
Acquiring an aggregation feature through a formula h' i=σ(∑jMhjnorm(Mask⊙A)ij) based on a mask matrix, wherein sigma represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and the term as a product of elements;
By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Represents line d of MIs scaled by the d-th element of (c).
Further, the fourth step includes:
Connecting the vehicle characteristic representation and the final aggregate characteristic through a formula f i=Concat(hi,h″i) to obtain a final vehicle characteristic { f 1,f2,...,fN},hi represents the characteristic representation of the ith vehicle, h' i represents the final aggregate characteristic of the ith vehicle, f N represents the final vehicle characteristic of the Nth vehicle, and placing f i into a fully connected layer to obtain a similar prediction result.
Further, the fifth step includes:
By the formula Constructing a first loss function, wherein y i represents an identity tag of an ith image, FC represents a full connection layer, II is a L2 standard distance, f i,p and f i,n represent positive and negative characteristics of an ith image x i in each small batch, and m represents a triplet distance;
By the formula
Constructing a second loss function, wherein S i represents the number of positive samples of the ith picture, and Softplus represents a function for acquiring non-negative probability;
By the formula Constructing a target loss function;
And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.
The invention also provides a device for identifying the vehicle guided by the camera topological graph, which comprises:
The feature representation module is used for constructing a training set and acquiring vehicle feature representation;
A topology construction module for constructing a camera topology map based on the vehicle feature representation;
the feature aggregation module is used for constructing a topological relation between feature representations of any two vehicles based on the camera topological graph, inputting the topological relation into the graph rolling network and obtaining final aggregation features;
The class prediction module is used for fusing the final aggregation characteristics with the vehicle characteristic representation, and inputting the fused result into the full-connection layer for class prediction;
The model training module is used for constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain a trained graph rolling network;
and the re-identification module is used for carrying out vehicle re-identification by utilizing the trained graph convolutional network.
Further, the feature representation module is further configured to:
Building training sets Where x i represents the ith image, N T represents the total number of pictures of the training set, y i represents its identity tag,Representing its camera tag;
the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is the feature representation of the { h 1,h2,...,hN},hN } vehicle.
Further, the topology construction module is further configured to:
According to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations among the cameras, so that a camera topological graph G= (V, E) is constructed, wherein V represents the camera nodes, Representing the C T th camera node, E is an edge set in a camera topological graph, E= { E system,Eposition,Eorientation,Eindividual},Esystem,Eposition,Eorientation,Eindividual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G system,Gposition,Gorientation,Gindividual.
Further, the feature aggregation module is further configured to:
The topological relation a ij of the feature representations h i and h j of any two vehicles is expressed as:
Wherein, the Representing the edge between the ith camera tab and the jth camera tab in the camera topology graph G.
Further, the working process of the graph rolling network in the feature aggregation module is as follows:
By the formula Calculating a mask matrix, wherein topk represents topk algorithm, sim i represents feature similarity between the ith image and the jth image, representing all samples, (Sim i, representing comparing the ith sample with all samples);
Acquiring an aggregation feature through a formula h' i=σ(∑jMhjnorm(Mask⊙A)ij) based on a mask matrix, wherein sigma represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and the term as a product of elements;
By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Represents line d of MIs scaled by the d-th element of (c).
Further, the class prediction module is further configured to:
Connecting the vehicle characteristic representation and the final aggregate characteristic through a formula f i=Concat(hi,h″i) to obtain a final vehicle characteristic { f 1,f2,...,fN},hi represents the characteristic representation of the ith vehicle, h' i represents the final aggregate characteristic of the ith vehicle, f N represents the final vehicle characteristic of the Nth vehicle, and placing f i into a fully connected layer to obtain a similar prediction result.
Further, the model training module is further configured to:
By the formula Constructing a first loss function, wherein y i represents an identity tag of an ith image, FC represents a full connection layer, II is a L2 standard distance, f i,p and f i,n represent positive and negative characteristics of an ith image x i in each small batch, and m represents a triplet distance;
By the formula
Constructing a second loss function, wherein S i represents the number of positive samples of the ith picture, and Softplus represents a function for acquiring non-negative probability;
By the formula Constructing a target loss function;
And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.
The method has the advantages that firstly, the training set is built to obtain the vehicle characteristic representation, in addition, the camera topological graph is built to input the topological relation into the graph rolling network to obtain the aggregation characteristic, then the two characteristics are fused to obtain the fused characteristic, the class prediction result is finally obtained according to the fused characteristic, the whole characteristic recognition process considers the original visual characteristic, namely the vehicle characteristic representation, and the aggregation characteristic is obtained based on the camera topological graph, so that under the condition that a plurality of different cameras acquire images, the difference of the images acquired by the different cameras and the connection between the adjacent cameras can be represented, the acquired characteristic vector can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.
Drawings
FIG. 1 is an image diagram of a prior art strong recognition reference model on VeRi-776 datasets;
fig. 2 is a schematic process diagram of a vehicle re-identification method guided by a camera topology according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of generating a camera topology diagram from a real-world traffic scene in a vehicle re-recognition method guided by a camera topology diagram according to an embodiment of the present invention, where fig. 3 (a) is a schematic diagram of a closed-circuit television camera system and fig. 3 (b) is a corresponding camera topology diagram;
Fig. 4 is a camera topology diagram based on a camera system, a position, a direction and an identity in the vehicle re-recognition method guided by a camera topology diagram according to the embodiment of the present invention, where fig. 4 (a) is a camera topology diagram based on a camera system, fig. 4 (b) is a camera topology diagram based on a camera position, fig. 4 (c) is a camera topology diagram based on a camera direction, and fig. 4 (d) is a camera topology diagram based on a camera individual.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in FIG. 1, which is a phenomenon diagram of a prior art strong recognition reference model on VeRi-776 dataset, three phenomena were found, namely (1) the Rank-1 performance under the whole camera system is far higher than that under each camera, and as shown in FIG. 1 (1), five-pointed star in the diagram represents the Rank-1 performance under the whole camera system. This shows that the Rank-1 performance of the prior art method is exaggerated because it only retrieves easy positive samples under the entire camera system and does not hit positive samples under each camera accurately. (2) The performance of the mAP under the whole camera system is far lower than that under each camera, as shown in FIG. 1 (2), which shows the mAP under the whole camera system with five stars. This indicates that the positive samples under each camera are more aggregated than the positive samples under the entire camera system. (3) Eliminating the top-ranked samples will significantly reduce re-recognition performance, as shown in fig. 1 (3). It shows that the re-recognition performance obtained by the conventional method is sub-optimal and is susceptible to camera interference. Furthermore, it is well known that the information of each identity under each camera is limited. If the information of the vehicle can be aggregated under the entire camera system, its information is sufficient and robust.
Thus, as shown in fig. 2, the present invention introduces a camera topology guided vehicle re-recognition method for vehicle re-recognition to fully explore easy positive and difficult to separate positive samples under the whole camera system, the method comprising:
s1, constructing a training set and acquiring vehicle characteristic representation, wherein the specific process comprises the following steps:
Building training sets Where x i represents the ith image, N T represents the total number of pictures of the training set, y i represents its identity tag,Representing its camera tag;
the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is: h N represents a feature representation of the nth vehicle.
S2, constructing a camera topological graph based on vehicle characteristic representation, wherein the specific process is as follows:
As shown in fig. 3, according to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations between cameras, thereby constructing a camera topology graph g= (V, E), where V represents a camera node, Representing the C T th camera node, E is an edge set in a camera topological graph, E= { E system,Eposition,Eorientation,Eindividual},Esystem,Eposition,Eorientation,Eindividual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G system,Gposition,Gorientation,Gindividual. The present embodiment is a camera topology constructed based on a cctv camera system, where fig. 3 (a) is a schematic diagram of the cctv camera system and fig. 3 (b) is a corresponding camera topology.
Fig. 4 shows the camera topology based on the camera system, the position, the direction and the identity (individual), wherein G system represents the camera topology based on the camera system, and each neighboring node is connected in turn for default setting, as shown in fig. 4 (a).
G position represents a camera topology map based on camera position. Cameras of successive intersections are first defined as spatially adjacent nodes. Camera5, camera7, and camera8 are regarded as adjacent nodes according to camera positions in the closed-circuit television camera system (fig. 3 (b)), and there is an edge between these adjacent nodes, as shown in fig. 4 (b). The camera relationship of G position is easier than that of G system because it requires positive samples from neighboring cameras to present a consistent feature representation. Since a continuously moving vehicle can be captured by two adjacent cameras, G position complies with vehicle travel logic. G position is interacted with the positive sample under the adjacent camera.
And G orientation, representing a camera topological graph based on camera directions. The more uniform the camera orientation, the more uniform the appearance of the positive sample. As shown in fig. 4 (c), the solid line indicates the side where the positional relationship between the cameras is determined, the broken line indicates the side where the directional relationship between the cameras is determined, and camera3 and camera4 are adjacent cameras, but since their camera directions are different, there is no side where the directional relationship between them is determined. The camera relationship of G orientation is easier than that of G position because it ignores irrelevant nodes based on camera direction. Notably, the present invention defines cameras whose two directions are orthogonal as well as neighboring cameras, such as camera5 and camera7 in fig. 4 (c). G orientation is interacting with the positive sample under the camera in the same direction.
G individual represents a camera topology map based on camera individuals. A video sequence of the target vehicle may be captured under the same camera. As shown in fig. 4 (d), any camera will have an edge on its own. The camera relationship of G individual is easiest because intra-class images captured under the same camera tend to have a large information overlap. G individual is interacting with the positive sample under the same camera.
Learning camera systems, locations, directions, and identity relationships helps reduce the range of feature interactions for the feature learning phase and the assessment phase. The four subgraphs are used to construct a camera topology. In the camera topology graph g= (V, E), the edges of two cameras may be represented as E ij, and the larger the value, the stronger the relationship between the cameras. In the four subgraphs, if edges exist between nodes, the value is 1, otherwise, the value is 0. The final goal is to obtain hierarchically aggregated topological features through four topological relationships between cameras. Such topological features are complementary to visual features, making the final feature more robust and more robust.
S3, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining a final aggregation feature, wherein the specific process comprises the following steps:
To embed the topological relation into the feature representation, the topological relation between cameras is converted into sample pairs. Creating an adjacency matrix between visual features, i.e. between the above-mentioned vehicle feature representations, using a closed-circuit television camera system-guided camera topology The topological relation a ij of the feature representations h i and h i of any two vehicles is expressed as:
Wherein, the Representing the edge between the ith camera tab and the jth camera tab in the camera topology graph G.
As can be seen from the above formula, the characteristic relationship between samples is represented by the camera relationship between samples. This is because the stronger the camera relationship between samples, the more overlap between vehicle images. However, this process involves many uncorrelated samples and adds a significant computational burden.
To discard uncorrelated samples and reduce the amount of computation, a mask matrix is introduced. Assuming that the two vehicle images are visually adjacent in the feature space, they are likely to be correlated. To this end, a k-nearest neighbor mask is calculated from the visual similarityIt will process the top k values of each row of similarity. Specifically, by the formula The mask matrix is calculated, wherein topk represents topk algorithm, simi represents feature similarity between the ith image and the jth image, all samples are represented, (Sim i:) represents comparison Sim i and all samples, topk algorithm is an existing algorithm, which mainly refers to finding the maximum K number in the unordered sequence of N numbers, in this embodiment, K data before similarity is found by comparison Sim i and all samples, and details of the algorithm are not described here.
The aggregation features are obtained by the formula h' i=σ(∑jMhjnorm(Mask⊙A)ij) based on a Mask matrix, where σ represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and # represents an element product, and adding the Mask matrix Mask to a weighted transformation matrix achieves feature aggregation that occurs only in neighboring cameras, which increases the interest in more relevant images.
While the above equation achieves a more robust aggregation feature while reducing computational complexity, such an aggregation process may introduce unwanted camera noise. To solve this problem, a learning camera memory matrix is designed, and the memory matrix is usedWeighted transformation matrixTo store the transformation matrices for the different cameras. Specifically, by the formulaThe aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a learning weight vector of (a) cameraIs used for the storage matrix of the (c),Represents line d of MIs scaled by the d-th element of (c).
S4, fusing the final aggregation characteristics and the vehicle characteristic representation, and inputting the fused result into a full-connection layer for class prediction, wherein the specific process is as follows:
In a graph rolling network based on camera topology, visual features are transformed into topological features, i.e. final aggregated features, by adjacency relations and specific transformation matrices. A graph roll-up network based on camera topology is utilized to learn the cross-camera representation to obtain more discernable vehicle features. The network is rolled based on a camera topology map, only manageable neighbor nodes are aggregated, and different weight matrices are learned for different cameras. The capability of interaction between the traditional graph rolling network and the graph nodes is reserved, and learning of different camera topological relations is introduced. In addition, the vehicle feature representation is connected to the final aggregate feature by equation f i=Concat(hi,h″i) to obtain the final vehicle feature H i represents the feature representation of the ith vehicle, h '' i represents the final aggregate feature of the ith vehicle, f N represents the final vehicle feature of the nth vehicle, and f i is put into the full connection layer to obtain a class prediction result. In practical applications, as shown in fig. 2, the vehicle feature representation may also be input to the hidden layer and the final aggregate feature may also be input to the hidden layer, and then the two may be fused by the formula f i=Concat(hi,h″i).
S5, constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum to obtain the trained graph rolling network, wherein the specific process comprises the following steps:
By the formula Constructing a first loss function, wherein y i represents an identity tag of an ith image, FC represents a full connection layer, II represents an L2 standard distance, f i,p and f i,n represent the most difficult positive and negative characteristics of an ith image x i in each small batch, and m represents a triplet distance, and although the first loss function is widely applied to the field of vehicle re-identification, the first loss function has limitation and cannot consider the topological relation among samples.
Therefore, the invention proposes a new topological cross entropy loss according to the topological relation in the topological cross entropy loss, promotes positive samples to cluster from strong to weak, optimizes the topological relation among the positive samples, trains the whole network in an end-to-end manner, and the topological cross entropy loss is also the key for aggregating vehicles under adjacent cameras, which makes the aggregation process more effective and efficient, and is concretely represented by the formula
Constructing a second loss function, wherein S i represents the number of positive samples of the ith picture, and Softplus represents a function for acquiring non-negative probability;
By the formula Constructing a target loss function;
And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.
And S6, acquiring vehicle characteristic representations in real time from image input ResNet-50 of the vehicle, constructing a camera topological graph, inputting the camera topological graph into a trained graph convolution network, performing vehicle re-identification by using the trained graph convolution network, fusing the identification result with the vehicle characteristic representations, and inputting the fusion result into a full-connection layer to obtain a prediction type result.
According to the technical scheme, the training set is firstly constructed to obtain the vehicle feature representation, the camera topological graph is additionally constructed to input the topological relation into the graph rolling network to obtain the aggregation feature, then the two features are fused to obtain the fused feature, the class prediction result is finally obtained according to the fused feature, the original visual feature, namely the vehicle feature representation, is considered in the whole feature recognition process, the aggregation feature is also obtained based on the camera topological graph, and therefore under the condition that images are collected by a plurality of different cameras, the difference of the images collected by the different cameras and the connection between the adjacent cameras can be represented, the collected feature vectors can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.
Example 2
Based on embodiment 1, embodiment 2 of the present invention further provides a vehicle re-recognition device guided by a camera topology map, the device comprising:
The feature representation module is used for constructing a training set and acquiring vehicle feature representation;
A topology construction module for constructing a camera topology map based on the vehicle feature representation;
the feature aggregation module is used for constructing a topological relation between feature representations of any two vehicles based on the camera topological graph, inputting the topological relation into the graph rolling network and obtaining final aggregation features;
The class prediction module is used for fusing the final aggregation characteristics with the vehicle characteristic representation, and inputting the fused result into the full-connection layer for class prediction;
The model training module is used for constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain a trained graph rolling network;
and the re-identification module is used for carrying out vehicle re-identification by utilizing the trained graph convolutional network.
Specifically, the feature representation module is further configured to:
Building training sets Where x i represents the ith image, N T represents the total number of pictures of the training set, y i represents its identity tag,Representing its camera tag;
the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is the feature representation of the { h 1,h2,...,hN},hN } vehicle.
Specifically, the topology construction module is further configured to:
According to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations among the cameras, so that a camera topological graph G= (V, E) is constructed, wherein V represents the camera nodes, Representing the C T th camera node, E is an edge set in a camera topological graph, E= { E system,Eposition,Eorientation,Eindividual},Esystem,Eposition,Eorientation,Eindividual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G system,Gposition,Gorientation,Gindividual.
Specifically, the feature aggregation module is further configured to:
The topological relation a ij of the feature representations h i and h j of any two vehicles is expressed as:
Wherein, the Representing the edge between the ith camera tab and the jth camera tab in the camera topology graph G.
More specifically, the working process of the graph rolling network in the feature aggregation module is as follows:
By the formula Calculating a mask matrix, wherein topk represents topk algorithm, sim i represents feature similarity between the ith image and the jth image, representing all samples, (Sim i,: representing comparison Sim i and all samples;
Acquiring an aggregation feature through a formula h' i=σ(∑jMhjnorm(Mask⊙A)ij) based on a mask matrix, wherein sigma represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and the term as a product of elements;
By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Represents line d of MIs scaled by the d-th element of (c).
Specifically, the class prediction module is further configured to:
Connecting the vehicle characteristic representation and the final aggregate characteristic through a formula f i=Concat(hi,h″i) to obtain a final vehicle characteristic { f 1,f2,...,fN},hi represents the characteristic representation of the ith vehicle, h' i represents the final aggregate characteristic of the ith vehicle, f N represents the final vehicle characteristic of the Nth vehicle, and placing f i into a fully connected layer to obtain a similar prediction result.
Specifically, the model training module is further configured to:
By the formula Constructing a first loss function, wherein y i represents an identity tag of an ith image, FC represents a full connection layer, II is a L2 standard distance, f i,p and f i,n represent positive and negative characteristics of an ith image x i in each small batch, and m represents a triplet distance;
By the formula
Constructing a second loss function, wherein S i represents the number of positive samples of the ith picture, and Softplus represents a function for acquiring non-negative probability;
By the formula Constructing a target loss function;
And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.
The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present invention in essence.

Claims (4)

1. A method of camera topology guided vehicle re-identification, the method comprising:
step one, constructing a training set, obtaining vehicle characteristic representation, and constructing the training set , wherein,The image of the i-th sheet is represented,The total number of pictures representing the training set,Which is indicative of the identity tag thereof,Representing its camera tag, and the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, the vehicle feature representation being:, a feature representation representing an nth vehicle;
Constructing a camera topological graph based on the vehicle characteristic representation, taking different cameras as nodes according to the vehicle characteristic representation, and constructing edges according to various relations among the cameras so as to construct the camera topological graph , wherein,Representing the camera node(s),,Represent the firstThe number of camera nodes is one,For the set of edges set in the camera topology,,Representing the edge sets constructed in relation to camera system, location, orientation and identity, respectively, then the camera topology based on camera system, location, orientation and identity are represented as:;
Step three, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining final aggregate features Is of the topological relation of (3)Expressed as:
Wherein, the Representing an edge between an ith camera tag and a jth camera tag in the camera topology graph G;
The working process of the graph convolution network is as follows:
By the formula A mask matrix is calculated, wherein,Representation ofThe algorithm is used to determine the degree of freedom of the algorithm,Representing the feature similarity between the ith image and the jth image, representing all samples,Representation of contrastAnd all samples;
mask matrix based pass formula An aggregate signature is obtained, wherein,Representing the function of the ReLU activation,The representation of the matrix of the transformation that can be learned,The normalization function is represented as a function of the normalization,Representing the element product;
By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Representation ofFrom line d of (2)Is scaled by the d-th element of (c);
Fusing the final aggregation characteristics and the vehicle characteristic representation, and inputting the fused result into a full-connection layer for class prediction;
Constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum to obtain a trained graph rolling network;
And step six, carrying out vehicle re-identification by using the trained graph convolutional network.
2. The camera topology guided vehicle re-identification method of claim 1, wherein said step four comprises:
By the formula Connecting the vehicle feature representation with the final aggregate feature to obtain the final vehicle feature,A characteristic representation representing an ith vehicle,Representing the final aggregate characteristics of the ith vehicle,Representing the final vehicle characteristics of the Nth vehicleAnd placing the full connection layer to obtain a class prediction result.
3. The camera topology guided vehicle re-identification method of claim 1, wherein said step five comprises:
By the formula A first loss function is constructed, wherein,Indicating that the full-link layer is to be formed,Which represents the L2 canonical distance of the object,AndRepresenting the ith image in each small lotThe most difficult positive and negative features, m represents the triplet distance;
By the formula
A second loss function is constructed, wherein,Indicating the number of positive samples for the i-th picture,A function representing the non-negative probability of being acquired;
By the formula Constructing a target loss function;
And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.
4. A camera topology guided vehicle re-identification apparatus, the apparatus comprising:
the feature representation module is used for constructing a training set, acquiring vehicle feature representation and constructing the training set , wherein,The image of the i-th sheet is represented,The total number of pictures representing the training set,Which is indicative of the identity tag thereof,Representing its camera tag, and the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, the vehicle feature representation being:, a feature representation representing an nth vehicle;
a topology construction module for constructing a camera topology based on the vehicle feature representation, using different cameras as nodes according to the vehicle feature representation, and constructing edges according to various relations between the cameras, thereby constructing a camera topology , wherein,Representing the camera node(s),,Represent the firstThe number of camera nodes is one,For the set of edges set in the camera topology,,Representing the edge sets constructed in relation to camera system, location, orientation and identity, respectively, then the camera topology based on camera system, location, orientation and identity are represented as:;
The feature aggregation module is used for constructing a topological relation between feature representations of any two vehicles based on the camera topological graph, inputting the topological relation into the graph rolling network and obtaining final aggregate features Is of the topological relation of (3)Expressed as:
Wherein, the Representing an edge between an ith camera tag and a jth camera tag in the camera topology graph G;
The working process of the graph convolution network is as follows:
By the formula A mask matrix is calculated, wherein,Representation ofThe algorithm is used to determine the degree of freedom of the algorithm,Representing the feature similarity between the ith image and the jth image, representing all samples,Representation of contrastAnd all samples;
mask matrix based pass formula An aggregate signature is obtained, wherein,Representing the function of the ReLU activation,The representation of the matrix of the transformation that can be learned,The normalization function is represented as a function of the normalization,Representing the element product;
By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Representation ofFrom line d of (2)Is scaled by the d-th element of (c);
The class prediction module is used for fusing the final aggregation characteristics with the vehicle characteristic representation, and inputting the fused result into the full-connection layer for class prediction;
The model training module is used for constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain a trained graph rolling network;
and the re-identification module is used for carrying out vehicle re-identification by utilizing the trained graph convolutional network.
CN202310260112.XA 2023-03-14 2023-03-14 A vehicle re-identification method and device guided by camera topology map Active CN116385981B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310260112.XA CN116385981B (en) 2023-03-14 2023-03-14 A vehicle re-identification method and device guided by camera topology map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310260112.XA CN116385981B (en) 2023-03-14 2023-03-14 A vehicle re-identification method and device guided by camera topology map

Publications (2)

Publication Number Publication Date
CN116385981A CN116385981A (en) 2023-07-04
CN116385981B true CN116385981B (en) 2025-09-30

Family

ID=86962611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310260112.XA Active CN116385981B (en) 2023-03-14 2023-03-14 A vehicle re-identification method and device guided by camera topology map

Country Status (1)

Country Link
CN (1) CN116385981B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118262385B (en) * 2024-05-30 2024-07-26 齐鲁工业大学(山东省科学院) Person Re-ID Method Based on Dispatching Sequence and Training Based on Camera Difference
CN118823327B (en) * 2024-07-22 2025-09-12 电子科技大学 A method for infrared small target detection based on edge topology guidance
CN119048843B (en) * 2024-10-28 2025-03-18 南昌大学 An AI-generated image detection method based on graph topology learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457596A (en) * 2022-09-06 2022-12-09 东南大学 Unsupervised pedestrian re-identification method based on camera perception map learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304795B (en) * 2018-01-29 2020-05-12 清华大学 Human Skeleton Behavior Recognition Method and Device Based on Deep Reinforcement Learning
CN112071075B (en) * 2020-06-28 2022-10-14 南京信息工程大学 Escaping vehicle weight identification method
US12367375B2 (en) * 2020-09-25 2025-07-22 Royal Bank Of Canada System and method for structure learning for graph neural networks
CN112149637B (en) * 2020-10-23 2024-09-13 北京百度网讯科技有限公司 Method and device for generating target re-identification model and for target re-identification
CN113255543B (en) * 2021-06-02 2023-04-07 西安电子科技大学 Facial Expression Recognition Method Based on Graph Convolutional Network
CN113255601B (en) * 2021-06-29 2021-11-12 深圳市安软科技股份有限公司 Training method and system for vehicle weight recognition model and related equipment
CN114330672B (en) * 2022-01-05 2024-06-14 安徽理工大学 Graph residual generation model, classification method, electronic device and storage medium for multi-information aggregation
CN114821084A (en) * 2022-03-21 2022-07-29 武汉轻工大学 Image classification method, device, equipment and storage medium
CN115761812A (en) * 2022-12-09 2023-03-07 北京信息科技大学 Shielded pedestrian re-identification method based on graph model and deformable convolution

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457596A (en) * 2022-09-06 2022-12-09 东南大学 Unsupervised pedestrian re-identification method based on camera perception map learning

Also Published As

Publication number Publication date
CN116385981A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN116385981B (en) A vehicle re-identification method and device guided by camera topology map
CN109886090B (en) A Video Pedestrian Re-identification Method Based on Multi-temporal Convolutional Neural Networks
CN111723645B (en) Multi-camera high-precision pedestrian re-identification method for in-phase built-in supervised scene
CN110070107B (en) Object recognition method and device
WO2021077984A1 (en) Object recognition method and apparatus, electronic device, and readable storage medium
CN111539370A (en) Image pedestrian re-identification method and system based on multi-attention joint learning
CN113034545A (en) Vehicle tracking method based on CenterNet multi-target tracking algorithm
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN112434599B (en) Pedestrian re-identification method based on random occlusion recovery of noise channel
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN110222718B (en) Image processing methods and devices
CN112200020A (en) Pedestrian re-identification method and device, electronic equipment and readable storage medium
CN113076963B (en) Image recognition method and device and computer readable storage medium
CN114677611B (en) Data identification method, storage medium and device
CN113052875A (en) Target tracking algorithm based on state perception template updating
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
CN111797700A (en) A vehicle re-identification method based on fine-grained discriminant network and second-order reordering
Wong et al. Multi-camera face detection and recognition in unconstrained environment
CN113239860A (en) Smoke and fire detection method based on video
CN117392176A (en) Pedestrian tracking method and system for video surveillance, computer-readable medium
Palle et al. Automated image and video object detection based on hybrid heuristic-based U-net segmentation and faster region-convolutional neural network-enabled learning
CN114387612A (en) Human body weight recognition method and device based on bimodal feature fusion network
CN116824695A (en) A non-local defense method for pedestrian re-identification based on feature denoising
CN116977674A (en) Image matching method, related device, storage medium and program product
Yogameena et al. SpyGAN sketch: Heterogeneous Face Matching in video for crime investigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant