CN116385981B

CN116385981B - A vehicle re-identification method and device guided by camera topology map

Info

Publication number: CN116385981B
Application number: CN202310260112.XA
Authority: CN
Inventors: 李洪潮; 孟庆洛; 孙丽萍; 罗永龙
Original assignee: Anhui Normal University
Current assignee: Anhui Normal University
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2025-09-30
Anticipated expiration: 2043-03-14
Also published as: CN116385981A

Abstract

The present invention discloses a vehicle re-identification method and device guided by a camera topology map. The method comprises: constructing a training set to obtain vehicle feature representations; constructing a camera topology map based on the vehicle feature representations; constructing a topological relationship between the feature representations of any two vehicles based on the camera topology map and inputting the relationship into a graph convolutional network to obtain final aggregated features; fusing the final aggregated features with the vehicle feature representations, and inputting the fusion result into a fully connected layer for class prediction; constructing a target loss function, training the graph convolutional network, and stopping the training until the target loss function value is minimized to obtain a trained graph convolutional network; and performing vehicle re-identification using the trained graph convolutional network. The present invention has the advantages of improving the accuracy of re-identification.

Description

Vehicle re-identification method and device guided by camera topological graph

Technical Field

The invention relates to the field of computer vision, in particular to a vehicle re-identification method and device guided by a camera topological graph.

Background

Vehicle Re-identification (Re-ID) is intended to retrieve a vehicle image of interest from gallery images captured by non-overlapping monitoring cameras. This is a positive and challenging task that has been of great interest due to its wide application in the areas of social security, smart city and intelligent transportation. Despite significant success, it still faces serious challenges such as in-camera occlusion, cross-camera illumination, and perspective changes, which limit its application in realistic complex scenes.

The prior art presents different approaches to address the three challenges described above. Representative methods are mainly classified into three types, 1) a method based on viewpoint learning, learning two metrics of similar and different perspectives in two feature spaces, and providing a viewpoint-aware network (VANet) for vehicle re-recognition, 2) a method based on component learning, providing a dual-path adaptive attention model to capture key points related to vehicle re-recognition (AAVER) components, and 3) a method based on path learning, constructing space-time constraints and optimizing a matching result of vehicle re-recognition, and using time-space information as a physical constraint to reduce complexity of a matching algorithm. However, these efforts have focused mainly on mining information inside a single image, and thus lack interaction between different images.

In recent years, graph roll-up networks (GCNs) have begun to prevail. The graph convolutional network generalizes the capability of Convolutional Neural Networks (CNNs) by performing convolutional operations on graph structure data. The traditional graph-convolution network model is widely applied to computer vision tasks, such AS 1) gesture estimation, capturing gesture information such AS local and global node relations through a semantic graph-convolution network (SemGCN), 2) action recognition, providing an action structure graph convolution network (AS-GCN) to extract useful space and time information for action recognition, 3) pedestrian re-recognition, providing a similarity-guided graph neural network, incorporating rich graph library similarity information into a training process, 4) vehicle re-recognition, providing a parsing-guided cross-component inference network (PCRNet) to learn discrimination feature representations, modeling correlation among components and the like. Vehicle re-identification based on graph roll-up networks is becoming an industry research hotspot.

Chinese patent publication No. CN112396027A discloses a vehicle re-identification method based on a graph convolution neural network, which comprises the steps of constructing a network model for vehicle re-identification, extracting global and local features of a vehicle image to be re-identified by using the convolution neural network, obtaining structural features by using the graph convolution neural network, calculating a loss function of the network model by using the structural features, training the network model according to the loss function, and mining structural information between the local features and between the local features and the global features by using the graph convolution neural network, so that better and more comprehensive feature expression is obtained, and the accuracy of vehicle re-identification is improved. However, in the case of a complex camera system scene, that is, a plurality of different cameras collecting images, the difference of the images collected by the different cameras and the connection between the adjacent cameras are not considered, so that the collected feature vectors cannot accurately express the vehicle information, and the accuracy of vehicle re-identification is not high.

Disclosure of Invention

The invention aims to solve the technical problem of improving the accuracy of vehicle re-identification in the scene of collecting images by a plurality of different cameras.

The invention solves the technical problems by the following technical means that a vehicle re-identification method guided by a camera topological graph comprises the following steps:

firstly, constructing a training set and acquiring vehicle characteristic representation;

constructing a camera topological graph based on the vehicle characteristic representation;

Step three, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining a final aggregation feature;

Fusing the final aggregation characteristics and the vehicle characteristic representation, and inputting the fused result into a full-connection layer for class prediction;

Constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum to obtain a trained graph rolling network;

And step six, carrying out vehicle re-identification by using the trained graph convolutional network.

The method has the advantages that firstly, a training set is built to obtain the vehicle characteristic representation, in addition, a camera topological graph is built to input the topological relation into a graph rolling network to obtain the aggregation characteristic, then the two characteristics are fused to obtain the fused characteristic, the class prediction result is finally obtained according to the fusion characteristic, the whole characteristic recognition process considers the original visual characteristic, namely the vehicle characteristic representation, and the aggregation characteristic is obtained based on the camera topological graph, so that under the condition that a plurality of different cameras acquire images, the difference of the images acquired by the different cameras and the connection between the adjacent cameras can be represented, the acquired characteristic vector can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.

Further, the first step includes:

Building training sets Where x _i represents the ith image, N ^T represents the total number of pictures of the training set, y _i represents its identity tag,Representing its camera tag;

the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is the feature representation of the { h ₁,h₂,...,h_N},h_N } vehicle.

Further, the second step includes:

According to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations among the cameras, so that a camera topological graph G= (V, E) is constructed, wherein V represents the camera nodes, Representing the C _T th camera node, E is an edge set in a camera topological graph, E= { E ^system,E^position,E^orientation,E^individual},E^system,E^position,E^orientation,E^individual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G ^system,G^position,G^orientation,G^individual.

Further, the third step includes:

The topological relation a _ij of the feature representations h _i and h _j of any two vehicles is expressed as:

Wherein, the Representing the edge between the ith camera tab and the jth camera tab in the camera topology graph G.

Further, the working process of the graph rolling network in the third step is as follows:

By the formula Calculating a mask matrix, wherein topk represents topk algorithm, sim _i represents feature similarity between the ith image and the jth image, representing all samples, (Sim _i, representing comparing the ith sample with all samples);

Acquiring an aggregation feature through a formula h' _i＝σ(∑_jMh_jnorm(Mask⊙A)_ij) based on a mask matrix, wherein sigma represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and the term as a product of elements;

By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Represents line d of MIs scaled by the d-th element of (c).

Further, the fourth step includes:

Connecting the vehicle characteristic representation and the final aggregate characteristic through a formula f _i＝Concat(h_i,h″_i) to obtain a final vehicle characteristic { f ₁,f₂,...,f_N},h_i represents the characteristic representation of the ith vehicle, h' _i represents the final aggregate characteristic of the ith vehicle, f _N represents the final vehicle characteristic of the Nth vehicle, and placing f _i into a fully connected layer to obtain a similar prediction result.

Further, the fifth step includes:

By the formula Constructing a first loss function, wherein y _i represents an identity tag of an ith image, FC represents a full connection layer, II is a L2 standard distance, f _i,p and f _i,n represent positive and negative characteristics of an ith image x _i in each small batch, and m represents a triplet distance;

By the formula

Constructing a second loss function, wherein S _i represents the number of positive samples of the ith picture, and Softplus represents a function for acquiring non-negative probability;

By the formula Constructing a target loss function;

And adjusting parameters of the graph rolling network, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain the trained graph rolling network.

The invention also provides a device for identifying the vehicle guided by the camera topological graph, which comprises:

The feature representation module is used for constructing a training set and acquiring vehicle feature representation;

A topology construction module for constructing a camera topology map based on the vehicle feature representation;

the feature aggregation module is used for constructing a topological relation between feature representations of any two vehicles based on the camera topological graph, inputting the topological relation into the graph rolling network and obtaining final aggregation features;

The class prediction module is used for fusing the final aggregation characteristics with the vehicle characteristic representation, and inputting the fused result into the full-connection layer for class prediction;

The model training module is used for constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum, so as to obtain a trained graph rolling network;

and the re-identification module is used for carrying out vehicle re-identification by utilizing the trained graph convolutional network.

Further, the feature representation module is further configured to:

Further, the topology construction module is further configured to:

Further, the feature aggregation module is further configured to:

Further, the working process of the graph rolling network in the feature aggregation module is as follows:

Further, the class prediction module is further configured to:

Further, the model training module is further configured to:

By the formula

By the formula Constructing a target loss function;

The method has the advantages that firstly, the training set is built to obtain the vehicle characteristic representation, in addition, the camera topological graph is built to input the topological relation into the graph rolling network to obtain the aggregation characteristic, then the two characteristics are fused to obtain the fused characteristic, the class prediction result is finally obtained according to the fused characteristic, the whole characteristic recognition process considers the original visual characteristic, namely the vehicle characteristic representation, and the aggregation characteristic is obtained based on the camera topological graph, so that under the condition that a plurality of different cameras acquire images, the difference of the images acquired by the different cameras and the connection between the adjacent cameras can be represented, the acquired characteristic vector can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.

Drawings

FIG. 1 is an image diagram of a prior art strong recognition reference model on VeRi-776 datasets;

fig. 2 is a schematic process diagram of a vehicle re-identification method guided by a camera topology according to an embodiment of the present invention;

Fig. 3 is a schematic diagram of generating a camera topology diagram from a real-world traffic scene in a vehicle re-recognition method guided by a camera topology diagram according to an embodiment of the present invention, where fig. 3 (a) is a schematic diagram of a closed-circuit television camera system and fig. 3 (b) is a corresponding camera topology diagram;

Fig. 4 is a camera topology diagram based on a camera system, a position, a direction and an identity in the vehicle re-recognition method guided by a camera topology diagram according to the embodiment of the present invention, where fig. 4 (a) is a camera topology diagram based on a camera system, fig. 4 (b) is a camera topology diagram based on a camera position, fig. 4 (c) is a camera topology diagram based on a camera direction, and fig. 4 (d) is a camera topology diagram based on a camera individual.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in FIG. 1, which is a phenomenon diagram of a prior art strong recognition reference model on VeRi-776 dataset, three phenomena were found, namely (1) the Rank-1 performance under the whole camera system is far higher than that under each camera, and as shown in FIG. 1 (1), five-pointed star in the diagram represents the Rank-1 performance under the whole camera system. This shows that the Rank-1 performance of the prior art method is exaggerated because it only retrieves easy positive samples under the entire camera system and does not hit positive samples under each camera accurately. (2) The performance of the mAP under the whole camera system is far lower than that under each camera, as shown in FIG. 1 (2), which shows the mAP under the whole camera system with five stars. This indicates that the positive samples under each camera are more aggregated than the positive samples under the entire camera system. (3) Eliminating the top-ranked samples will significantly reduce re-recognition performance, as shown in fig. 1 (3). It shows that the re-recognition performance obtained by the conventional method is sub-optimal and is susceptible to camera interference. Furthermore, it is well known that the information of each identity under each camera is limited. If the information of the vehicle can be aggregated under the entire camera system, its information is sufficient and robust.

Thus, as shown in fig. 2, the present invention introduces a camera topology guided vehicle re-recognition method for vehicle re-recognition to fully explore easy positive and difficult to separate positive samples under the whole camera system, the method comprising:

s1, constructing a training set and acquiring vehicle characteristic representation, wherein the specific process comprises the following steps:

the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, which is: h _N represents a feature representation of the nth vehicle.

S2, constructing a camera topological graph based on vehicle characteristic representation, wherein the specific process is as follows:

As shown in fig. 3, according to the vehicle characteristic representation, different cameras are taken as nodes, and edges are constructed according to various relations between cameras, thereby constructing a camera topology graph g= (V, E), where V represents a camera node, Representing the C _T th camera node, E is an edge set in a camera topological graph, E= { E ^system,E^position,E^orientation,E^individual},E^system,E^position,E^orientation,E^individual respectively represents the edge set constructed by the relation of the camera system, the position, the direction and the identity, and the camera topological graph based on the camera system, the position, the direction and the identity is respectively represented as G ^system,G^position,G^orientation,G^individual. The present embodiment is a camera topology constructed based on a cctv camera system, where fig. 3 (a) is a schematic diagram of the cctv camera system and fig. 3 (b) is a corresponding camera topology.

Fig. 4 shows the camera topology based on the camera system, the position, the direction and the identity (individual), wherein G ^system represents the camera topology based on the camera system, and each neighboring node is connected in turn for default setting, as shown in fig. 4 (a).

G ^position represents a camera topology map based on camera position. Cameras of successive intersections are first defined as spatially adjacent nodes. Camera5, camera7, and camera8 are regarded as adjacent nodes according to camera positions in the closed-circuit television camera system (fig. 3 (b)), and there is an edge between these adjacent nodes, as shown in fig. 4 (b). The camera relationship of G ^position is easier than that of G ^system because it requires positive samples from neighboring cameras to present a consistent feature representation. Since a continuously moving vehicle can be captured by two adjacent cameras, G ^position complies with vehicle travel logic. G ^position is interacted with the positive sample under the adjacent camera.

And G ^orientation, representing a camera topological graph based on camera directions. The more uniform the camera orientation, the more uniform the appearance of the positive sample. As shown in fig. 4 (c), the solid line indicates the side where the positional relationship between the cameras is determined, the broken line indicates the side where the directional relationship between the cameras is determined, and camera3 and camera4 are adjacent cameras, but since their camera directions are different, there is no side where the directional relationship between them is determined. The camera relationship of G ^orientation is easier than that of G ^position because it ignores irrelevant nodes based on camera direction. Notably, the present invention defines cameras whose two directions are orthogonal as well as neighboring cameras, such as camera5 and camera7 in fig. 4 (c). G ^orientation is interacting with the positive sample under the camera in the same direction.

G ^individual represents a camera topology map based on camera individuals. A video sequence of the target vehicle may be captured under the same camera. As shown in fig. 4 (d), any camera will have an edge on its own. The camera relationship of G ^individual is easiest because intra-class images captured under the same camera tend to have a large information overlap. G ^individual is interacting with the positive sample under the same camera.

Learning camera systems, locations, directions, and identity relationships helps reduce the range of feature interactions for the feature learning phase and the assessment phase. The four subgraphs are used to construct a camera topology. In the camera topology graph g= (V, E), the edges of two cameras may be represented as E _ij, and the larger the value, the stronger the relationship between the cameras. In the four subgraphs, if edges exist between nodes, the value is 1, otherwise, the value is 0. The final goal is to obtain hierarchically aggregated topological features through four topological relationships between cameras. Such topological features are complementary to visual features, making the final feature more robust and more robust.

S3, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining a final aggregation feature, wherein the specific process comprises the following steps:

To embed the topological relation into the feature representation, the topological relation between cameras is converted into sample pairs. Creating an adjacency matrix between visual features, i.e. between the above-mentioned vehicle feature representations, using a closed-circuit television camera system-guided camera topology The topological relation a _ij of the feature representations h _i and h _i of any two vehicles is expressed as:

As can be seen from the above formula, the characteristic relationship between samples is represented by the camera relationship between samples. This is because the stronger the camera relationship between samples, the more overlap between vehicle images. However, this process involves many uncorrelated samples and adds a significant computational burden.

To discard uncorrelated samples and reduce the amount of computation, a mask matrix is introduced. Assuming that the two vehicle images are visually adjacent in the feature space, they are likely to be correlated. To this end, a k-nearest neighbor mask is calculated from the visual similarityIt will process the top k values of each row of similarity. Specifically, by the formula The mask matrix is calculated, wherein topk represents topk algorithm, simi represents feature similarity between the ith image and the jth image, all samples are represented, (Sim _i:) represents comparison Sim _i and all samples, topk algorithm is an existing algorithm, which mainly refers to finding the maximum K number in the unordered sequence of N numbers, in this embodiment, K data before similarity is found by comparison Sim _i and all samples, and details of the algorithm are not described here.

The aggregation features are obtained by the formula h' _i＝σ(∑_jMh_jnorm(Mask⊙A)_ij) based on a Mask matrix, where σ represents a ReLU activation function, M represents a learnable transformation matrix, norm represents a normalization function, and # represents an element product, and adding the Mask matrix Mask to a weighted transformation matrix achieves feature aggregation that occurs only in neighboring cameras, which increases the interest in more relevant images.

While the above equation achieves a more robust aggregation feature while reducing computational complexity, such an aggregation process may introduce unwanted camera noise. To solve this problem, a learning camera memory matrix is designed, and the memory matrix is usedWeighted transformation matrixTo store the transformation matrices for the different cameras. Specifically, by the formulaThe aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a learning weight vector of (a) cameraIs used for the storage matrix of the (c),Represents line d of MIs scaled by the d-th element of (c).

S4, fusing the final aggregation characteristics and the vehicle characteristic representation, and inputting the fused result into a full-connection layer for class prediction, wherein the specific process is as follows:

In a graph rolling network based on camera topology, visual features are transformed into topological features, i.e. final aggregated features, by adjacency relations and specific transformation matrices. A graph roll-up network based on camera topology is utilized to learn the cross-camera representation to obtain more discernable vehicle features. The network is rolled based on a camera topology map, only manageable neighbor nodes are aggregated, and different weight matrices are learned for different cameras. The capability of interaction between the traditional graph rolling network and the graph nodes is reserved, and learning of different camera topological relations is introduced. In addition, the vehicle feature representation is connected to the final aggregate feature by equation f _i＝Concat(h_i,h″_i) to obtain the final vehicle feature H _i represents the feature representation of the ith vehicle, h '' _i represents the final aggregate feature of the ith vehicle, f _N represents the final vehicle feature of the nth vehicle, and f _i is put into the full connection layer to obtain a class prediction result. In practical applications, as shown in fig. 2, the vehicle feature representation may also be input to the hidden layer and the final aggregate feature may also be input to the hidden layer, and then the two may be fused by the formula f _i＝Concat(h_i,h″_i).

S5, constructing a target loss function, training the graph rolling network, and stopping training until the target loss function value is minimum to obtain the trained graph rolling network, wherein the specific process comprises the following steps:

By the formula Constructing a first loss function, wherein y _i represents an identity tag of an ith image, FC represents a full connection layer, II represents an L2 standard distance, f _i,p and f _i,n represent the most difficult positive and negative characteristics of an ith image x _i in each small batch, and m represents a triplet distance, and although the first loss function is widely applied to the field of vehicle re-identification, the first loss function has limitation and cannot consider the topological relation among samples.

Therefore, the invention proposes a new topological cross entropy loss according to the topological relation in the topological cross entropy loss, promotes positive samples to cluster from strong to weak, optimizes the topological relation among the positive samples, trains the whole network in an end-to-end manner, and the topological cross entropy loss is also the key for aggregating vehicles under adjacent cameras, which makes the aggregation process more effective and efficient, and is concretely represented by the formula

By the formula Constructing a target loss function;

And S6, acquiring vehicle characteristic representations in real time from image input ResNet-50 of the vehicle, constructing a camera topological graph, inputting the camera topological graph into a trained graph convolution network, performing vehicle re-identification by using the trained graph convolution network, fusing the identification result with the vehicle characteristic representations, and inputting the fusion result into a full-connection layer to obtain a prediction type result.

According to the technical scheme, the training set is firstly constructed to obtain the vehicle feature representation, the camera topological graph is additionally constructed to input the topological relation into the graph rolling network to obtain the aggregation feature, then the two features are fused to obtain the fused feature, the class prediction result is finally obtained according to the fused feature, the original visual feature, namely the vehicle feature representation, is considered in the whole feature recognition process, the aggregation feature is also obtained based on the camera topological graph, and therefore under the condition that images are collected by a plurality of different cameras, the difference of the images collected by the different cameras and the connection between the adjacent cameras can be represented, the collected feature vectors can accurately express the vehicle information, and the accuracy of vehicle re-recognition is higher.

Example 2

Based on embodiment 1, embodiment 2 of the present invention further provides a vehicle re-recognition device guided by a camera topology map, the device comprising:

Specifically, the feature representation module is further configured to:

Specifically, the topology construction module is further configured to:

Specifically, the feature aggregation module is further configured to:

More specifically, the working process of the graph rolling network in the feature aggregation module is as follows:

By the formula Calculating a mask matrix, wherein topk represents topk algorithm, sim _i represents feature similarity between the ith image and the jth image, representing all samples, (Sim _i,: representing comparison Sim _i and all samples;

Specifically, the class prediction module is further configured to:

Specifically, the model training module is further configured to:

By the formula

By the formula Constructing a target loss function;

The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present invention in essence.

Claims

1. A method of camera topology guided vehicle re-identification, the method comprising:

step one, constructing a training set, obtaining vehicle characteristic representation, and constructing the training set , wherein,The image of the i-th sheet is represented,The total number of pictures representing the training set,Which is indicative of the identity tag thereof,Representing its camera tag, and the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, the vehicle feature representation being:, a feature representation representing an nth vehicle;

Constructing a camera topological graph based on the vehicle characteristic representation, taking different cameras as nodes according to the vehicle characteristic representation, and constructing edges according to various relations among the cameras so as to construct the camera topological graph , wherein,Representing the camera node(s),,Represent the firstThe number of camera nodes is one,For the set of edges set in the camera topology,,Representing the edge sets constructed in relation to camera system, location, orientation and identity, respectively, then the camera topology based on camera system, location, orientation and identity are represented as:;

Step three, constructing a topological relation between feature representations of any two vehicles based on a camera topological graph, inputting the topological relation into a graph convolution network, and obtaining final aggregate features Is of the topological relation of (3)Expressed as:

Wherein, the Representing an edge between an ith camera tag and a jth camera tag in the camera topology graph G;

The working process of the graph convolution network is as follows:

By the formula A mask matrix is calculated, wherein,Representation ofThe algorithm is used to determine the degree of freedom of the algorithm,Representing the feature similarity between the ith image and the jth image, representing all samples,Representation of contrastAnd all samples;

mask matrix based pass formula An aggregate signature is obtained, wherein,Representing the function of the ReLU activation,The representation of the matrix of the transformation that can be learned,The normalization function is represented as a function of the normalization,Representing the element product;

By the formula The aggregate characteristics are weighted and updated to obtain final aggregate characteristics, wherein,Is a cameraIs a combination of the learning weight vectors of the (c),Representation ofFrom line d of (2)Is scaled by the d-th element of (c);

2. The camera topology guided vehicle re-identification method of claim 1, wherein said step four comprises:

By the formula Connecting the vehicle feature representation with the final aggregate feature to obtain the final vehicle feature,A characteristic representation representing an ith vehicle,Representing the final aggregate characteristics of the ith vehicle,Representing the final vehicle characteristics of the Nth vehicleAnd placing the full connection layer to obtain a class prediction result.

3. The camera topology guided vehicle re-identification method of claim 1, wherein said step five comprises:

By the formula A first loss function is constructed, wherein,Indicating that the full-link layer is to be formed,Which represents the L2 canonical distance of the object,AndRepresenting the ith image in each small lotThe most difficult positive and negative features, m represents the triplet distance;

By the formula

A second loss function is constructed, wherein,Indicating the number of positive samples for the i-th picture,A function representing the non-negative probability of being acquired;

By the formula Constructing a target loss function;

4. A camera topology guided vehicle re-identification apparatus, the apparatus comprising:

the feature representation module is used for constructing a training set, acquiring vehicle feature representation and constructing the training set , wherein,The image of the i-th sheet is represented,The total number of pictures representing the training set,Which is indicative of the identity tag thereof,Representing its camera tag, and the training set is input into the vehicle representation model ResNet-50 to extract a vehicle feature representation, the vehicle feature representation being:, a feature representation representing an nth vehicle;

a topology construction module for constructing a camera topology based on the vehicle feature representation, using different cameras as nodes according to the vehicle feature representation, and constructing edges according to various relations between the cameras, thereby constructing a camera topology , wherein,Representing the camera node(s),,Represent the firstThe number of camera nodes is one,For the set of edges set in the camera topology,,Representing the edge sets constructed in relation to camera system, location, orientation and identity, respectively, then the camera topology based on camera system, location, orientation and identity are represented as:;

The feature aggregation module is used for constructing a topological relation between feature representations of any two vehicles based on the camera topological graph, inputting the topological relation into the graph rolling network and obtaining final aggregate features Is of the topological relation of (3)Expressed as:

The working process of the graph convolution network is as follows: