Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention, where the embodiment is applicable to a situation where a collected road map is processed, and the method may be executed by an image processing apparatus, where the apparatus may be composed of hardware and/or software, and may be generally integrated in a device with an image processing function, where the device may be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in fig. 1, the method specifically includes the following steps:
and S110, detecting the target object in the original road map to obtain a target object detection result.
In this embodiment, the original road map may be a map whose resolution may be greater than 640 × 640, for example: 1920*1080. Illustratively, fig. 2a is an exemplary diagram of an original road map in the present embodiment.
The target object may be an object unrelated to the map information, such as: invalid shadows, on-road cars, pedestrians, bicycles, etc. In this embodiment, the objects unrelated to the map information may include: lane lines, arrows, traffic signs, kerbs, road boundaries, traffic signs, traffic lights, and the like may affect traffic.
The target object detection result may include a category of the target object and position information of the target object detection frame. The target object detection frame may be a rectangular frame, and the position information of the target object detection frame may be coordinate information of four vertices of the rectangular frame.
Optionally, the target object of the original road map is detected, and the manner of obtaining the target object detection result may be: and inputting the original road map into a target object detection model, and outputting a target object detection result.
The target object detection model is obtained based on sample graph training marked with target objects, and the target objects are objects irrelevant to map information. The target object detection model may be a YoloV5 model or an R-FCN model, which is not limited herein. Specifically, a sample graph is obtained first, then a target object in the sample graph is labeled, then a target object detection model is trained based on the sample graph labeled with the target object, finally an original road graph is input into the trained target object detection model, and a target object detection result is output. For example, fig. 2b is an exemplary diagram of the detection result of the target object, and as shown in fig. 2b, the target object that is not related to the map information in the original road map is framed by the detection frame.
And S120, segmenting the target object in the original road map based on the target object detection result to obtain a road segmentation map.
The target object in the original road map is segmented, which can be understood as the target object is cut out from the original road map.
Specifically, the target object in the original road map is segmented based on the target object detection result, and the way of obtaining the road segmentation map may be: acquiring a target object mask image according to a target object detection result and an original road image; and fusing the target object mask image and the original road image to obtain a road segmentation image.
The target object mask map can be understood as a binary map with the same size as the original road map, for example: black and white. Obtaining the mask map of the target object according to the detection result of the target object and the original road map can be understood as follows: the image inside the target object detection frame is replaced with one color (e.g., white) and the image outside the target object detection frame is replaced with another color (e.g., black). For example, fig. 2c is an exemplary diagram of a mask diagram of the target object in the present embodiment, and as shown in fig. 2c, the color of the region where the target object is located is white, and the color of the other regions is black.
In this embodiment, the manner of obtaining the mask map of the target object according to the detection result of the target object and the original road map may be: and adjusting the pixel value of the pixel point in the target object detection frame in the original road image to be a first set value, and adjusting the pixel value of the pixel point outside the target object detection frame in the original road image to be a second set value to obtain a target object mask image.
The first setting value may be 0, and the second setting value may be 1, that is, the target object mask map is a binary map in which the pixel value of one pixel is 0 or 1. Specifically, the method for obtaining the road segmentation map by fusing the target object mask map and the original road map may be as follows: and multiplying the pixel values of the object pixel points in the target object mask image and the original road image to obtain a new pixel value, thereby obtaining a road segmentation image. In this example, for the pixel points in the target object detection frame, since the pixel values of the pixel points in the target object mask image are 0, the pixel values of the pixel points corresponding to the original road image are still 0 after being multiplied by each other; for the pixel points outside the target object detection frame, because the pixel values of the pixel points in the target object mask image are 1, the pixel values of the pixel points corresponding to the original road image are still the pixel values in the original road image after being multiplied by the pixel values. And finally, the obtained road segmentation image is that the pixel value of the pixel point in the target object detection frame is 0, and the pixel value of the pixel point outside the target object detection frame is consistent with that in the original road image, so that the aim of segmenting the original road image by the target object is fulfilled. For example, fig. 2d is an exemplary diagram of a road segmentation chart in this embodiment, as shown in fig. 2d, the area where the target object is located is black, and other areas remain unchanged.
And S130, repairing the road segmentation graph to obtain a road repair graph.
Here, the repairing of the road segmentation map may be understood as a process of repairing a region where the segmented target object is located, or may be understood as a process of filling up a region where the segmented target object is located.
In this embodiment, the road segmentation map is repaired, and the road repair map is obtained by the following method: and inputting the road segmentation map into the image restoration model, and outputting the road restoration map. For example, fig. 2e is an exemplary diagram of the road repairing map in the embodiment, and as shown in fig. 2e, the region where the segmented target object is located is repaired.
Wherein the image restoration model comprises: at least one down-sampling module, at least one feature extraction module, and at least one up-sampling module; wherein, the characteristic extraction module is a Fast Fourier Convolution (FFC) module. Exemplarily, fig. 3 is an exemplary diagram of an image restoration model in the present embodiment, and as shown in fig. 3, the image restoration model includes three down-sampling modules, 2 FFC modules, and three up-sampling modules, and an input terminal of a first FFC is connected to an output terminal of a last FFC in a jumping manner.
The down-sampling module is used for sampling the road segmentation map into a map with low resolution, and the down-sampling process can retain general information of the image, such as: color, overall style, or subject matter, etc. The principle of the upsampling module may employ interpolation or transposed convolution, etc.
In this embodiment, the principle of the FFC model may be: dividing the road segmentation graph into local information and global information based on channels, then respectively extracting local features of the local information, extracting global features of the global information, then performing cross fusion on the local features and the global features, and finally performing feature splicing based on the channels to obtain a final feature extraction result.
Fig. 4 is an exemplary diagram of the FFC model in the present embodiment, and as shown in fig. 4, data is first input to the channel information splitting unit to split the input data into local information and global information based on a channel. Then, the local information is input into two convolutional layers (Conv3 x 3) in parallel to perform local feature extraction, and the global information is input into one convolutional layer (Conv3 x 3) and one global feature extraction unit in parallel to perform global feature extraction. Then, the local features and the global features are subjected to cross fusion to obtain the local features and the global features after the cross fusion, the local features and the global features after the cross fusion are input into an activation layer (BN-RELU) to be activated, and finally the activated local features and the activated global features are input into a feature splicing layer to be spliced. Wherein, the global feature extraction unit includes in order according to input transmission order: convolution + active layer (Conv-BN-ReLU), Fourier transform layer (Real FFT2d), convolution + active layer (Conv-BN-ReLU), Fourier inverse transform layer (Inv Real FFT2d) and convolution layer (Conv 1) wherein the output of the first convolution + active layer is skip connected to the convolution layer (Conv 1) input.
And S140, performing style conversion on the road repairing graph to obtain a target style road graph.
The style conversion of the road repairing map can be understood as converting the road repairing map into an image with a set style. For example, fig. 5 is an exemplary diagram of a target style road map in the present embodiment.
In this embodiment, style conversion is performed on the road restoration map, and a manner of obtaining the target style road map may be as follows: and inputting the road repair map into a set stylized model, and outputting a target style road map.
The set stylized model may be obtained based on training of a generative confrontation network, and the generative confrontation network may be a CycleGAN network or a Pix2Pix network, which is not limited herein.
The training mode for setting the stylized model may be: acquiring a training sample set; inputting a training sample set into a first generator, outputting a first generated sample atlas, inputting the first generated sample atlas into a second generator, and outputting a second generated sample atlas; inputting the training sample set into a second generator, outputting a third generated sample set, inputting the third generated sample set into the first generator, and outputting a fourth generated sample set; and training the first generator and the second generator based on the first generation sample atlas, the second generation sample atlas, the third generation sample atlas and the fourth generation sample atlas, and determining the trained first generator as a set stylized model.
The training sample set comprises a road map sample set and a corresponding stylized road map sample set. Specifically, the set of road patterns may be represented as X, the stylized road pattern sample and representation Y, the first generator represented as G1, and the second generator represented as G2. The training sample set is input to the first generator and the output first generated sample atlas may be expressed as: g1(X) ═ Y1 or G1(Y) ═ Y2, i.e., Y1 and Y2 constitute a first atlas of generated samples; inputting the first generated sample atlas into a second generator, outputting the second generated sample atlas may be expressed as: g2(Y1 or Y2) ═ X1. Inputting the training sample set into the second generator, outputting a third generated sample atlas may be expressed as: g2(Y) ═ X2 or G2(X) ═ X3, that is, X1 and X2 constitute a third generated sample atlas; the third generated sample atlas is input to the first generator and the fourth generated sample atlas is output which may be denoted as G1(X1 or X2) ═ Y3.
In this embodiment, the process of training the first generator and the second generator based on the first generation sample atlas, the second generation sample atlas, the third generation sample atlas and the fourth generation sample atlas, and determining the trained first generator as the set stylized model may be: respectively inputting the first generated sample set and the stylized road map sample set into a first discriminator, outputting a first discrimination result, and determining a first loss function based on the first discrimination result; inputting the second generated sample image set and the road pattern image set into a second judging device, outputting a second judging result, and determining a second loss function based on the second judging result; determining a third loss function based on the first generated sample set of maps and the stylized road map sample set; determining a fourth loss function based on the second generated sample set and the road map sample set; respectively inputting the third generated sample set and the road map sample set into a second discriminator, outputting a third discrimination result, and determining a fifth loss function based on the third discrimination result; inputting the fourth generated sample atlas and the stylized road pattern atlas into a first discriminator, outputting a fourth discrimination result, and determining a sixth loss function based on the fourth discrimination result; determining a seventh loss function based on the third generated sample set of maps and the road map sample set; determining an eighth loss function based on the fourth generated sample set and the stylized road map sample set; and finally, training the first generator, the first discriminator, the second generator and the second discriminator based on the first loss function, the second loss function, the third loss function, the fourth loss function, the fifth loss function, the sixth loss function, the seventh loss function and the eighth loss function.
Specifically, the first loss function is a loss between the first determination result and the first real result, the second loss function is a loss between the second determination result and the second real result, the third loss function is a loss between the first generated sample set Y1 or Y2 and the stylized road map sample Y, the fourth loss function is a loss between the second generated sample set X1 and the road map sample set X, the fifth loss function is a loss between the third determination result and the third real result, the sixth loss function is a loss between the fourth determination result and the fourth real determination result, the seventh loss function is a loss between the third generated sample set X2 or X3 and the road map sample set X, and the eighth loss is a loss between the third generated sample set Y3 and the stylized road map sample Y. In this embodiment, the loss function may be evaluated by a Mean Square Error loss (Mean Square Error) function.
According to the technical scheme of the embodiment, the target object in the original road map is detected to obtain a target object detection result; segmenting a target object in the original road map based on a target object detection result to obtain a road segmentation map; repairing the road segmentation graph to obtain a road repair graph; and carrying out style conversion on the road repairing graph to obtain a target style road graph. According to the image processing method provided by the embodiment of the invention, the original road map is converted into the target style road map, so that subsequent quality inspection is carried out on the basis of the target segmentation road map, and the efficiency of quality inspection can be improved.
Example two
Fig. 6 is a schematic structural diagram of an image processing apparatus according to a second embodiment of the present invention, and as shown in fig. 6, the apparatus includes:
the target object detection module 610 is configured to detect a target object in an original road map to obtain a target object detection result;
a road segmentation map obtaining module 620, configured to segment a target object in an original road map based on a target object detection result to obtain a road segmentation map;
a road repair map obtaining module 630, configured to repair the road segmentation map to obtain a road repair map;
and the stylizing module 640 is used for performing style conversion on the road repairing map to obtain a target style road map.
Optionally, the target object detecting module 610 is further configured to:
inputting the original road map into a target object detection model, and outputting a target object detection result; the target object detection model is obtained based on sample graph training marked with target objects, and the target objects are objects irrelevant to map information.
Optionally, the road segmentation map obtaining module 620 is further configured to:
acquiring a target object mask image according to a target object detection result and an original road image;
and fusing the target object mask image and the original road image to obtain a road segmentation image.
Optionally, the target object detection result is position information of the target object detection frame; the road segmentation map obtaining module 620 is further configured to:
and adjusting the pixel value of the pixel point in the target object detection frame in the original road image to be a first set value, and adjusting the pixel value of the pixel point outside the target object detection frame in the original road image to be a second set value to obtain a target object mask image.
Optionally, the road repair map obtaining module 630 is further configured to:
inputting the road segmentation graph into an image restoration model, and outputting a road restoration graph; wherein the image restoration model comprises: at least one down-sampling module, at least one feature extraction module, and at least one up-sampling module; the characteristic extraction module is a Fast Fourier Convolution (FFC) module.
Optionally, the stylization module 640 is further configured to:
and inputting the road repair map into a set stylized model, and outputting a target style road map.
Optionally, the training mode for setting the stylized model is as follows:
acquiring a training sample set; the training sample set comprises a road map sample set and a corresponding stylized road map sample set;
inputting a training sample set into a first generator, outputting a first generated sample atlas, inputting the first generated sample atlas into a second generator, and outputting a second generated sample atlas;
inputting the training sample set into a second generator, outputting a third generated sample set, inputting the third generated sample set into the first generator, and outputting a fourth generated sample set;
and training the first generator and the second generator based on the first generation sample atlas, the second generation sample atlas, the third generation sample atlas and the fourth generation sample atlas, and determining the trained first generator as a set stylized model.
The device can execute the methods provided by all the embodiments of the invention, and has corresponding functional modules and beneficial effects for executing the methods. For details not described in detail in this embodiment, reference may be made to the methods provided in all the foregoing embodiments of the present invention.
EXAMPLE III
FIG. 7 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as an image processing method.
In some embodiments, the image processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.