CN112930543A - Neural network processing device, neural network processing method, and neural network processing program - Google Patents

Neural network processing device, neural network processing method, and neural network processing program Download PDF

Info

Publication number
CN112930543A
CN112930543A CN201980066531.1A CN201980066531A CN112930543A CN 112930543 A CN112930543 A CN 112930543A CN 201980066531 A CN201980066531 A CN 201980066531A CN 112930543 A CN112930543 A CN 112930543A
Authority
CN
China
Prior art keywords
input
output
conversion
quantization process
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980066531.1A
Other languages
Chinese (zh)
Inventor
山田贵登
安东尼奥·托马斯·内瓦多比尔切斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maxell Ltd
Original Assignee
Lippmade Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lippmade Co ltd filed Critical Lippmade Co ltd
Publication of CN112930543A publication Critical patent/CN112930543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

一种卷积神经网络(CNN)处理装置(1),设置有:输入缓冲器(10),用于存储要对CNN应用的输入信号A;权重缓冲器(11),用于存储权重U;卷积运算单元(12),执行包括输入信号A与权重U之间的乘积‑求和运算的卷积运算;存储单元16,存储表(160),在该表中,转换‑量化处理的输入与输出彼此相关联,其中,转换‑量化处理接受卷积运算的运算结果作为所述输入,基于预定条件来转换输入值,通过降低转换后的数据的比特精度来进行量化,并且输出结果;以及处理单元(14),参考表(160)以获取转换‑量化处理的与卷积运算的运算结果相对应的输出。

Figure 201980066531

A convolutional neural network (CNN) processing device (1), provided with: an input buffer (10) for storing an input signal A to be applied to the CNN; a weight buffer (11) for storing a weight U; Convolution operation unit (12), carries out the convolution operation comprising the product-summation operation between input signal A and weight U; Storage unit 16, storage table (160), in this table, the input of conversion-quantization processing and outputs are associated with each other, wherein the conversion-quantization process accepts the operation result of the convolution operation as the input, converts the input value based on a predetermined condition, quantizes by reducing the bit precision of the converted data, and outputs the result; and The processing unit (14) refers to the table (160) to obtain the output of the transform-quantization process corresponding to the operation result of the convolution operation.

Figure 201980066531

Description

Neural network processing device, neural network processing method, and neural network processing program
Technical Field
The present invention relates to a neural network processing device, a neural network processing method, and a neural network processing program.
Background
In recent years, Convolutional Neural Networks (CNNs) have received attention as deep neural networks are used to classify images into a plurality of classes. CNNs are characterized by including convolutional layers in the deep neural network. The convolutional layer applies a filter to the input data. More specifically, in the convolutional layer, the window of the filter is slid by a predetermined step, and a multiply-sum operation of multiplying the elements of the filter by the corresponding filter of the input data and obtaining the sum of the products is performed.
Fig. 13 is a view showing a procedure of signal processing of general CNN. The CNN includes an input layer, an intermediate layer, and an output layer (see, for example, non-patent document 1 and non-patent document 2). In the intermediate layer, a convolution operation of multiplying the input layer by a weight is performed.
As shown in fig. 13, detection processing of ReLU (rectified linear unit) or normalization such as BN (batch normalization) (hereinafter sometimes collectively referred to as "conversion") is performed in the intermediate layer as necessary according to the result of the convolution operation. In some cases, a pooling process is performed.
Features of the input signal extracted via the convolution operation are input to a classifier formed of a fully connected layer, and a classification result is output from an output layer. As described above, one of the features of a neural network such as CNN is to repeatedly perform a product-sum operation and a conversion operation.
The input values or weights of the input data used in CNN sometimes include decimal points. In the product-sum operation of a convolutional neural network such as CNN, arithmetic processing is performed while ensuring the number of bits of the operation result as indicated by "input signal", "weight", and "convolutional operation" in fig. 13. As described above, in a convolutional neural network such as CNN, many arithmetic processes require many input values having many bits in an intermediate layer formed of a plurality of layers, or the like.
Documents of the related art
Non-patent document
Non-patent document 1: k.he, x.zhang, s.ren and j.sun, "Deep residual learning for image recognition", CVPR conference corpus, 2016. (. ResNet)
Non-patent document 2: hideki Aso et al, "Deep Learning," Kindaikagaku-sha, 11 months 2015.
Disclosure of Invention
Problems to be solved by the invention
However, when a convolutional neural network such as CNN is implemented by embedded hardware such as FPGA (field programmable gate array) or microcomputer, computational resources are limited. For this reason, the processing speed for many arithmetic processes of many input values having many bits is reduced.
The present invention has been made to solve the above-described problems, and an object thereof is to provide a neural network processing apparatus and a neural network processing method capable of suppressing a decrease in the processing speed of a neural network even if embedded hardware is used.
Means for solving the problems
In order to solve the above problem, there is provided a neural network processing apparatus including: a first memory configured to store an input signal provided to a neural network; a second memory configured to store weights of the neural network; an operation unit configured to perform a convolution operation of the neural network including a product-sum operation of the input signal and the weight; a third memory configured to store a table configured to associate an input of the conversion-quantization process and an output thereof with each other, wherein the input is an operation result of a convolution operation of the operation unit, and the output is a result of the conversion-quantization process of converting an input value based on a predetermined condition and quantizing the converted value by reducing a bit precision of the converted data; and a processing unit configured to acquire an output of the conversion-quantization process corresponding to an operation result of the operation unit by referring to the table.
In order to solve the above problem, there is also provided a neural network processing method, including: a first step of storing an input signal supplied to a neural network in a first memory; a second step of storing weights of the neural network in a second memory; a third step of performing a convolution operation of the neural network including a product-sum operation of the input signal and the weight; a fourth step of storing a table in the third memory, the table being configured to associate an input and an output of the conversion-quantization process with each other, wherein the input is an operation result of the convolution operation in the third step, and the output is a result of the conversion-quantization process of converting an input value based on a predetermined condition and quantizing the converted value by reducing the bit precision of the converted data; and a fifth step of acquiring an output of the conversion-quantization process corresponding to the operation result in the third step by referring to the table.
In order to solve the above problem, there is also provided a neural network processing program configured to perform: a first step of storing an input signal supplied to a neural network in a first memory; a second step of storing weights of the neural network in a second memory; a third step of performing a convolution operation of the neural network including a product-sum operation of the input signal and the weight; a fourth step of storing a table in the third memory, the table being configured to associate an input and an output of the conversion-quantization process with each other, wherein the input is an operation result of the convolution operation in the third step, and the output is a result of the conversion-quantization process of converting an input value based on a predetermined condition and quantizing the converted value by reducing the bit precision of the converted data; and a fifth step of acquiring an output of the conversion-quantization process corresponding to the operation result in the third step by referring to the table.
Effects of the invention
According to the present invention, since the output of the conversion-quantization process corresponding to the operation result of the convolution operation is acquired by referring to a table configured to associate the input and the output of the conversion-quantization process with each other, the conversion-quantization process converts the operation result of the convolution operation based on a predetermined condition and quantizes the converted value by reducing the bit precision of the converted data. Thus, even if embedded hardware is used, the reduction in the processing speed of the neural network can be suppressed.
Drawings
Fig. 1 is a block diagram for explaining an outline of functions of a CNN processing apparatus according to an embodiment of the present invention;
fig. 2 is a block diagram showing a hardware arrangement of a CNN processing apparatus according to an embodiment of the present invention;
fig. 3 is a view for explaining an outline of a procedure of a CNN processing method according to an embodiment of the present invention;
fig. 4 is a block diagram for explaining the function of a processing unit according to the first embodiment;
fig. 5 is a view for explaining the arrangement of tables according to the first embodiment;
fig. 6 is a view for explaining the function of a processing unit according to the first embodiment;
fig. 7 is a block diagram for explaining the function of a processing unit according to the second embodiment;
fig. 8 is a view for explaining the arrangement of tables according to the second embodiment;
fig. 9 is a view for explaining a procedure of a CNN processing method according to the second embodiment;
fig. 10 is a block diagram for explaining the function of a processing unit according to the third embodiment;
fig. 11 is a view for explaining the arrangement of tables according to the third embodiment;
fig. 12 is a view for explaining a procedure of a CNN processing method according to the third embodiment; and
fig. 13 is a view for explaining arithmetic processing of the convolution CNN.
Detailed Description
A preferred embodiment of the present invention will now be described in detail with reference to fig. 1 to 12.
[ overview of CNN processing apparatus ]
The neural network processing apparatus according to the present invention is a CNN processing apparatus 1 that uses a CNN as a neural network.
The CNN processing apparatus 1 according to the embodiment is an arithmetic processing apparatus that: performing a multiply-sum operation of the input signal provided to the CNN and the weights of the CNN; outputting an operation result; and also converting the result of the multiply-sum operation by applying the ReLU to the operation result. The arithmetic processing includes: a product-sum operation (hereinafter sometimes also referred to as "convolution operation") of convolution layers in the intermediate layer of the CNN, and a conversion operation of converting an operation result of the convolution operation based on a predetermined condition. Note that, as an example of "conversion", an example of applying ReLU to the operation result of the convolution operation will be described below.
The CNN processing apparatus 1 performs convolution operation of the input signal and the weight, and obtains the output of one convolution layer by applying ReLU to the operation result.
For convenience of description, it is assumed that an operation result calculated by applying ReLU to a result of a product-sum operation of convolution layers is used as an input signal of a next convolution layer. The CNN processing apparatus 1 repeatedly performs the product-sum operation and the conversion operation of the input signal and the weight, thereby performing the product-sum operation and the conversion processing as many times as the number of convolution layers of the CNN model set in advance.
[ functional Block of CNN processing apparatus ]
The CNN processing apparatus 1 includes an input buffer (first memory) 10, a weight buffer (second memory) 11, a convolution operation unit (operation unit) 12, an operation result buffer 13, a processing unit 14, an output buffer 15, and a storage unit (third memory) 16.
The input buffer 10 is a memory that stores the input signal supplied to the CNN. More specifically, the input buffer 10 is implemented by a main storage device 103 to be described later, and image data or the like supplied from the outside is stored in the input buffer 10. The input signal supplied to the input buffer 10 may be image data that has undergone preprocessing in advance. Examples of preprocessing are monochrome conversion, contrast adjustment, and brightness adjustment. The input signal may be reduced so that it has a bit depth set according to the CNN model set in advance in the CNN processing apparatus 1.
As for the values of the input signal supplied to the input buffer 10, for example, values including a decimal point and represented by an array of floating point numbers of 32-bit or 16-bit precision, or values obtained by reducing these values to a preset number-of-bits expression are used.
The weight buffer 11 is a memory that stores the weight of CNN. More specifically, the weight buffer 11 is realized by a main storage device 103 to be described later, and the weight parameter of the CNN stored in the storage unit 16 or a server (not shown) installed outside the CNN processing apparatus 1 is loaded into the weight buffer 11. In this embodiment, as for the value of the weight, a value including a decimal point and represented by an array of floating point numbers of 32-bit or 16-bit precision, or a value obtained by reducing these values to a preset number-of-bits expression is used.
The convolution operation unit 12 performs convolution operation of CNN including product-sum operation of the input signal stored in the input buffer 10 and the weight stored in the weight buffer 11. More specifically, the convolution operation unit 12 reads out the input signal and the weight from the input buffer 10 and the weight buffer 11, respectively, and performs convolution operation according to the convolution layer forming the CNN model set in advance in the CNN processing apparatus 1. The operation result output from the convolution operation unit 12 is supplied to the operation result buffer 13.
The operation result buffer 13 buffers the result of the convolution operation by the convolution operation unit 12.
The processing unit 14 refers to the table 160 stored in the storage unit 16, and outputs a result of performing conversion and quantization processing (hereinafter sometimes also referred to as "conversion-quantization processing") on the result of the convolution operation read out from the operation result buffer 13. More specifically, the processing unit 14 reads out the convolution operation result stored in the operation result buffer 13, and acquires and outputs a value corresponding to the input of the conversion-quantization process by referring to the table 160 storing the input/output relationship of the conversion-quantization process.
The conversion of the result of the convolution operation includes, for example, application of an activation function such as ReLU or normalization using BN or the like, and refers to converting the operation result of the convolution operation based on a predetermined condition. The activation function determines the result of the convolution operation.
The ReLU applied to the result of the convolution operation is a ramp function for converting it into 0 if the result of the convolution operation is a negative value or into a linear transformation value if the result of the convolution operation is a positive value. As described above, the input/output relationship of the processing such as the ReLU that converts the operation result of the convolution operation is determined in advance.
On the other hand, in order to reduce the operation load, a value obtained by converting the operation result of the convolution operation through ReLU or the like is quantized by reducing the bit precision. The quantization of data includes, for example, generally known fractional processing such as rounding, rounding up, rounding down, and rounding to the nearest even number, and refers to, for example, imposing a restriction by converting a value (for example, a value including a decimal point) obtained via the result of a ReLU conversion convolution operation into an integer.
The processing unit 14 refers to a table 160 in which, for example, the input of the ReLU (i.e., the result of the product-sum operation) and the value obtained by further quantizing the output of the ReLU are stored in association with each other. Therefore, the processing unit 14 can perform the conversion processing of the operation result of the convolution operation by the ReLU together with the quantization processing.
That is, the processing unit 14 acquires values obtained by two arithmetic processes (i.e., a conversion process by the result of convolution operation of the ReLU or the like and a quantization process by the reference table 160).
The output buffer 15 temporarily stores the output acquired by the processing unit 14.
The storage unit 16 includes a table 160. The storage unit 16 stores the output from the processing unit 14, which is temporarily stored in the output buffer 15.
The table 160 stores inputs and outputs of the transform-quantization process in association with each other. More specifically, the table 160 stores data in which an input of a predetermined conversion process such as a ReLU and an output obtained by quantizing a value converted by the ReLU, which is performed by a preset fraction process, are associated with each other.
[ hardware arrangement of CNN processing apparatus ]
An example of the hardware arrangement of the CNN processing apparatus 1 having the above-described functions will be described next with reference to the block diagram of fig. 2.
As shown in fig. 2, the CNN processing apparatus 1 may be realized by, for example, a computer including: a processor 102, a main storage device 103, a communication interface 104, a secondary storage device 105, and an input/output device 106, which are connected via a bus 101, and a program configured to control these hardware resources.
Programs to be used by the processor 102 to perform various controls and operations are stored in advance in the main storage device 103. The functions of the CNN processing apparatus 1 including the convolution operation unit 12 and the processing unit 14 shown in fig. 1 are realized by the processor 102 and the main storage device 103.
The input buffer 10, the weight buffer 11, the operation result buffer 13, and the output buffer 15 described with reference to fig. 1 are implemented by the main storage device 103.
The communication interface 104 is an interface circuit configured to perform communication with each external electronic device via the communication network NW. Input signals such as image data and weights to be used by the CNN processing apparatus 1 may be received from an external server or the like via the communication interface 104.
The storage device 105 is formed by a readable/writable storage medium configured to read/write various information such as programs and data from/to the storage medium and a drive device. In the secondary storage device 105, a hard disk such as a flash memory or a semiconductor memory may be used as a storage medium.
The secondary storage device 105 includes: a storage area configured to store input data and weights acquired from the outside, and a program storage area configured to store a program used by the CNN processing apparatus 1 to perform arithmetic processing such as convolution operation of the CNN. The storage unit 16 described with reference to fig. 1 is implemented by the secondary storage device 105. Also, for example, secondary storage device 105 may include a backup area configured to backup the above-described data or programs.
The input/output device 106 is formed by an I/O terminal that inputs or outputs a signal from or to an external device. A display device (not shown) may be provided to display the operation result output by the CNN processing apparatus 1 via the input/output device 106.
The program stored in the program storage area of the secondary storage device 105 may be a program configured to execute processes temporally consecutively in accordance with the order of the CNN processing method to be described in this specification, or may be a program configured to execute processes in parallel or at necessary timing such as the timing of calling. The program may be processed by one computer, or may be processed distributively by a plurality of computers.
[ CNN treatment method ]
An outline of the operation of the CNN processing apparatus 1 having the above-described arrangement will be described next with reference to fig. 3. First, the input buffer 10 and the weight buffer 11 temporarily store the input signal a and the weight U, respectively, supplied from a server or the like installed outside the CNN processing apparatus 1 (step S1 and step S2).
The input signal a is vectorized input image data and has dimensions in the vertical direction and the horizontal direction. The value of the input signal a is represented by a multi-bit value, for example comprising a decimal point. On the other hand, the weight U is an element of the kernel represented by the matrix, and is a parameter adjusted, updated, and finally decided by learning of the CNN. The value of the weight U has dimensions in the vertical direction and the horizontal direction, and each element is represented by a multi-bit value including, for example, a decimal point.
Next, the convolution operation unit 12 reads out the input signal a and the weight U from the input buffer 10 and the weight buffer 11, respectively, and performs convolution operation (step S3). More specifically, the convolution operation unit 12 multiplies a vector of the input signal a by a matrix of weights U.
More specifically, the convolution operation unit 12 slides the window of the preset filter of the CNN by a predetermined step. The convolution operation unit 12 multiplies the element of the weight U by the corresponding element of the input signal a at each position of the filter, and obtains the sum of the products.
The convolution operation unit 12 stores the operation result X of the convolution operation of the product-sum operation in the corresponding position of the operation result buffer 13 (step S4).
After that, the processing unit 14 reads out the result X of the convolution operation from the operation result buffer 13, and acquires the output Y of the conversion-quantization process of the operation result X by referring to the table 160 in the storage unit 16 (step S5). The acquired output Y is temporarily stored in the output buffer 15, read out by the processor 102, and output (step S6).
Note that well-known pooling processing may be performed as necessary for the output Y acquired by the processing unit 14 (see non-patent document 2). The output Y obtained in step S6 is input to a full connection layer (not shown) forming a subsequent classifier, and the image data of the input signal a is discriminated.
As described above, the CNN processing apparatus 1 according to the present invention stores the table 160 configured to associate the input of the conversion process such as the ReLU and the output obtained by quantizing the value converted by the ReLU with each other in the storage unit 16. The CNN processing apparatus 1 acquires an output of the conversion-quantization process corresponding to the operation result of the convolution operation by referring to the table 160. Therefore, the calculation load of the CNN processing apparatus 1 can be reduced as compared with the case where the conversion processing such as ReLU and the quantization processing for the converted value are separately executed. Also, as a result, the signal processing of the CNN can be speeded up.
[ first embodiment ]
As a detailed example of the CNN processing apparatus 1 having the above-described arrangement, the CNN processing apparatus 1 according to the first embodiment of the present invention will be described next. Fig. 4 is a block diagram showing the functional arrangement of the processing unit 14 of the CNN processing apparatus 1. The remaining components of the CNN processing apparatus 1 are the same as those described with reference to fig. 1. Fig. 5 is a view for explaining a data structure of the table 160. Fig. 6 is a view for explaining the conversion-quantization process of the processing unit 14.
[ function blocks of processing units ]
The processing unit 14 includes an input determination unit 140 and an output acquisition unit 141. The input determination unit 140 reads out the operation result of the convolution operation from the operation result buffer 13, compares the operation result with each preset input section of the conversion-quantization process, and determines an input section including the operation result of the convolution operation (i.e., the value of the input of the conversion-quantization process).
The table 160 stored in the storage unit 16 stores data in which each of input intervals obtained by dividing the input of the conversion-quantization process into a plurality of continuous intervals and values obtained by quantizing values converted by the ReLU are associated with each other.
More specifically, as shown in fig. 5, the table 160 stores data in which, for example, the input of the conversion-quantization process is divided into five sections, and each input section and the output of the conversion-quantization process are associated with each other. For example, if the operation result X of the convolution operation is "1", the input determination unit 140 determines that the operation result X corresponds to the input section 1 ≦ X < 2 by comparison with each input section.
The output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to the input section according to the determination result of the input determination unit 140 by referring to the table 160 stored in the storage unit 16.
More specifically, as shown in fig. 6, the output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to one of the five input sections determined by the input determination unit 140. In the example of the conversion-quantization process shown in fig. 6, two arithmetic processes (i.e., a conversion process by the ReLU and a quantization process by a preset fraction process) are executed together.
[ CNN treatment method ]
The operation of the CNN processing apparatus 1 including the above-described input determination unit 140 according to this embodiment will be described next with reference to fig. 3. Note that, in the CNN processing method according to this embodiment, steps S1 to S4 are the same as the processing described in the outline of the CNN processing method.
First, the input buffer 10 and the weight buffer 11 temporarily store the input signal a and the weight U, respectively, supplied from a server or the like installed outside the CNN processing apparatus 1 (step S1 and step S2).
Next, the convolution operation unit 12 reads out the input signal a and the weight U from the input buffer 10 and the weight buffer 11, respectively, and performs convolution operation (step S3). More specifically, the convolution operation unit 12 multiplies a vector of the input signal a by a matrix of weights U.
Next, the convolution operation unit 12 stores the operation result X of the convolution operation of the product-sum operation in the corresponding position of the operation result buffer 13 (step S4).
After that, the processing unit 14 reads out the result X of the convolution operation from the operation result buffer 13, and acquires the output Y obtained by the conversion-quantization process of the operation result X by referring to the table 160 in the storage unit 16 (step S5). More specifically, for the result X of the convolution operation (i.e., the input X of the conversion-quantization process), the input determination unit 140 performs comparison of values for each preset input section of the conversion-quantization process, and determines an input section including the value of the operation result X. After that, the output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to the input section determined by the input determination unit 140 by referring to the table 160.
The acquired output Y is temporarily stored in the output buffer 15, read out by the processor 102, and output (step S6).
As described above, according to the CNN processing device 1 of the first embodiment, it is determined which of a plurality of consecutive input sections of the conversion-quantization process includes the operation result of the convolution operation (i.e., the value of the input of the conversion-quantization process), and the output of the conversion-quantization process is acquired by referring to the table 160 based on the determination result.
Therefore, since the conversion processing such as the ReLU and the quantization processing of the operation result of the convolution operation can be performed by determining the input section and the reference table 160, it is possible to reduce the operation load of the CNN and suppress the reduction of the processing speed even if embedded hardware is used.
In addition, a table 160 representing the input/output relationship of the conversion-quantization process is stored in a storage area of hardware such as the secondary storage device 105. For this reason, by replacing the values in the table 160 according to the form of the neural network, the neural network having a desired processing function can be implemented more flexibly by hardware.
[ second embodiment ]
Next, a second embodiment of the present invention will be described. Note that in the following description, the same reference numerals as in the above-described first embodiment denote the same components, and the description thereof will be omitted.
In the first embodiment, the case has been described in which the processing unit 14 includes the input determination unit 140, and in the conversion-quantization processing, the input determination unit 140 compares the operation result of the convolution operation with a plurality of continuous input intervals. However, in the second embodiment, the processing unit 14 includes a threshold processing unit (first threshold processing unit) 142 that performs threshold processing for the input of the transform-quantization processing. The following will mainly describe components different from the first embodiment.
[ function blocks of processing units ]
The processing unit 14 includes an output acquisition unit 141 and a threshold processing unit 142.
The threshold processing unit 142 reads out the operation result of the convolution operation from the operation result buffer 13, and compares the operation result with a threshold value set in advance for the input of the conversion-quantization process.
As shown in fig. 8, the table 160A in the storage unit 16 according to the embodiment stores data in which, for example, five threshold values are set for the input of the conversion-quantization process and each threshold value and the output of the conversion-quantization process are associated with each other.
For example, the threshold processing unit 142 compares whether the operation result X of the convolution operation is smaller than the set threshold or equal to or larger than the threshold. More specifically, if the operation result X of the convolution operation is "1", the threshold processing unit 142 outputs a comparison result indicating that the operation result X ("1") is smaller than the set threshold "2" and equal to or larger than the threshold "1".
The output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to the threshold value according to the input of the comparison result by referring to the table 160A based on the comparison result of the threshold value processing unit 142. The output acquired by the output acquisition unit 141 is temporarily stored in the output buffer 15.
[ CNN treatment method ]
The operation of the CNN processing apparatus 1 including the above-described threshold processing unit 142 according to this embodiment will be described next with reference to fig. 9. Note that, in the CNN processing method according to this embodiment, steps S1 to S4 are the same as the processing described in the outline of the CNN processing method shown in fig. 3.
First, the input buffer 10 and the weight buffer 11 temporarily store the input signal a and the weight U, respectively, supplied from a server or the like installed outside the CNN processing apparatus 1 (step S1 and step S2).
Next, the convolution operation unit 12 reads out the input signal a and the weight U from the input buffer 10 and the weight buffer 11, respectively, and performs convolution operation (step S3). More specifically, the convolution operation unit 12 multiplies a vector of the input signal a by a matrix of weights U.
Next, the convolution operation unit 12 stores the operation result X of the convolution operation of the product-sum operation in the corresponding position of the operation result buffer 13 (step S4).
After that, the processing unit 14 reads out the result X of the convolution operation from the operation result buffer 13, and acquires the output Y obtained by the conversion-quantization process of the operation result X by referring to the table 160A in the storage unit 16 (step S5A). More specifically, the threshold processing unit 142 compares the result X of the convolution operation with a threshold value set in advance for the input of the conversion-quantization process, and outputs a threshold value smaller than the operation result X. After that, the output acquisition unit 141 acquires the output Y corresponding to the input threshold value output by the threshold processing unit 142 by referring to the table 160A.
The acquired output Y is temporarily stored in the output buffer 15, read out by the processor 102, and output (step S6).
As described above, according to the CNN processing apparatus 1 of the second embodiment, the threshold value set in advance for the input of the conversion-quantization process and the table 160A configured to associate the threshold value and the output of the conversion-quantization process with each other are stored in the storage unit 16. In addition, the output acquisition unit 141 acquires the output of the conversion-quantization processing based on the comparison result of the threshold value and the operation result of the convolution operation by referring to the table 160A.
For this reason, conversion processing by ReLU or the like for the operation result of the convolution operation and quantization processing of the operation result of the convolution operation can be performed by threshold processing. Therefore, if the output of the conversion-quantization process monotonically increases or monotonically decreases, the output of the conversion-quantization process can be uniquely decided by comparison with a threshold value.
In particular, when the CNN processing apparatus 1 is realized by predetermined hardware, the comparison using the input section requires sequential processing, but the comparison between the input and the threshold value may be completed at once. Therefore, according to the CNN processing apparatus 1, even if embedded hardware is used, the arithmetic processing of CNN can be performed at a higher speed.
[ third embodiment ]
Next, a third embodiment of the present invention will be described. Note that in the following description, the same reference numerals as in the above-described first and second embodiments denote the same components, and the description thereof will be omitted.
In the second embodiment, the case has been described in which, if the output of the conversion-quantization process monotonically increases or monotonically decreases, the threshold processing unit 142 performs comparison with a threshold processing unit that is set in advance for the input of the conversion-quantization process. However, in the third embodiment, based on the division information for identifying the input section in which the output of the conversion-quantization process monotonically increases and the input section in which the output of the conversion-quantization process monotonically decreases, the threshold processing is performed only in the input section to which the value of the input of the conversion-quantization process belongs. The following will mainly describe components different from the first embodiment and the second embodiment.
[ function blocks of processing units ]
The processing unit 14 includes an input determination unit 140, an output acquisition unit 141, and a threshold processing unit 142 (second threshold processing unit).
The input determination unit 140 determines an input section of the conversion-quantization process to which the operation result X of the convolution operation of the operation unit belongs, based on division information for identifying an input section in which the output of the conversion-quantization process monotonically increases and an input section in which the output of the conversion-quantization process monotonically decreases.
The threshold processing unit 142 compares the operation result X of the convolution operation unit 12 with a plurality of thresholds set in advance for each input of the conversion-quantization process in the input section determined by the input determination unit 140, and outputs a threshold corresponding to the operation result X.
The output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to the threshold value output by the threshold value processing unit 142 by referring to the table 160B stored in the storage unit 16.
The storage unit 16 stores the table 160B. As shown in fig. 11, the table 160B stores data in which division information for identifying an input section in which the output of the conversion-quantization process monotonically increases and an input section in which the output of the conversion-quantization process monotonically decreases, a plurality of threshold values set in advance for each input of the conversion-quantization process, and outputs of the conversion-quantization process corresponding to the plurality of threshold values are associated with each other, respectively.
The division information includes, for example, information representing the vertices of the monotone increasing and monotone decreasing switches in the case of an output obtained by further quantizing the operation result of the convolution operation having undergone the conversion processing such as the activation function, formed of a monotone increasing section and a monotone decreasing section as the quadratic function.
As shown in the example of fig. 11, in the input/output relationship of the conversion-quantization process, the monotonous increase and monotonous decrease of the output Y are converted at the input X ═ 6.
In the example of fig. 11, for example, it is assumed that the operation result X (input X) of the convolution operation is smaller than "6" (X < 6). In this case, the input determination unit 140 determines that the input X of the conversion-quantization process belongs to an input section in which the output monotonically increases.
[ CNN treatment method ]
The operation of the CNN processing apparatus 1 having the above-described arrangement according to this embodiment will be described next with reference to fig. 12. Note that, in the CNN processing method according to this embodiment, steps S1 to S4 are the same as the processing described in the outline of the CNN processing method shown in fig. 3.
First, the input buffer 10 and the weight buffer 11 temporarily store the input signal a and the weight U, respectively, supplied from a server or the like installed outside the CNN processing apparatus 1 (step S1 and step S2).
Next, the convolution operation unit 12 reads out the input signal a and the weight U from the input buffer 10 and the weight buffer 11, respectively, and performs convolution operation (step S3). More specifically, the convolution operation unit 12 multiplies a vector of the input signal a by a matrix of weights U.
Next, the convolution operation unit 12 stores the operation result X of the convolution operation of the product-sum operation in the corresponding position of the operation result buffer 13 (step S4).
After that, the processing unit 14 reads out the result X of the convolution operation from the operation result buffer 13, and acquires the output Y obtained by the conversion-quantization process of the operation result X by referring to the table 160B in the storage unit 16 (step S5B).
More specifically, the input determination unit 140 determines an input section to which the input X of the conversion-quantization process (i.e., the operation result X of the convolution operation) belongs, based on the division information for identifying the input section in which the output of the conversion-quantization process monotonically increases and the input section in which the output of the conversion-quantization process monotonically decreases.
After that, the threshold processing unit 142 compares the operation result X (input X) of the convolution operation with the threshold value set in advance for the input X of the conversion-quantization process in the input section determined by the input determination unit 140, and outputs the threshold value according to the comparison result. After that, the output acquisition unit 141 acquires the output Y of the conversion-quantization process corresponding to the threshold value output by the threshold value processing unit 142 by referring to the table 160B.
The acquired output Y is temporarily stored in the output buffer 15, read out by the processor 102, and output (step S6).
As described above, according to the CNN processing apparatus 1 of the third embodiment, the section to which the input X of the conversion-quantization process belongs is determined based on the division information for identifying the input/output section in which the output Y of the conversion-quantization process monotonically increases and the input/output section in which the output of the conversion-quantization process monotonically decreases. Threshold processing for the input X of the conversion-quantization process is performed in the determined input section, and the output Y of the conversion-quantization process is acquired by referring to the table 160B.
For this reason, even if the input/output relationship of the conversion-quantization process is not monotonically increasing or monotonically decreasing, the threshold process is performed for each of the sections that monotonically increase and monotonically decrease. Therefore, the arithmetic processing of CNN can be performed at higher speed.
Embodiments of a neural network processing apparatus, a neural network processing method, and a neural network processing program according to the present invention have been described above. However, the present invention is not limited to the above-described embodiments, and various changes and modifications conceived by those skilled in the art may be made without departing from the scope of the appended claims.
For example, in the above-described embodiments, the CNN has been described as an example of a neural network. However, the neural network employed by the neural network processing device is not limited to the CNN.
It is noted that the various functional blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented with a general purpose processor, a GPU, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or a combination of some of the above devices designed to perform the functions described herein.
A microprocessor may be used as a general purpose processor. Rather, a processor, controller, microcontroller, or state machine based on convolution techniques may also be used. A processor may also be implemented as a combination of, for example, a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration of computing device.
Description of reference numerals
1. CNN processing apparatus, 10, input buffer, 11, weight buffer, 12, convolution operation unit, 13, operation result buffer, 14, processing unit, 15, output buffer, 16, storage unit, 101, bus, 102, processor, 103, main storage device, 104, communication interface, 105, auxiliary storage device, 106, input/output device, 160, table, NW, communication network, U, weight, a, input signal.

Claims (8)

1.一种神经网络处理装置,包括:1. A neural network processing device, comprising: 第一存储器,被配置为存储提供给神经网络的输入信号;a first memory configured to store input signals provided to the neural network; 第二存储器,被配置为存储所述神经网络的权重;a second memory configured to store the weights of the neural network; 运算单元,被配置为执行所述神经网络的包括所述输入信号与所述权重的乘积-求和运算的卷积运算;an arithmetic unit configured to perform a convolution operation of the neural network including a product-sum operation of the input signal and the weight; 第三存储器,被配置为存储表,所述表被配置为使转换-量化处理的输入与输出彼此相关联,其中,所述输入是所述运算单元的所述卷积运算的运算结果,并且所述输出是基于预定条件来转换输入值和通过降低转换后的数据的比特精度来量化转换后的值的所述转换-量化处理的结果;以及a third memory configured to store a table configured to associate an input and an output of the conversion-quantization process with each other, wherein the input is an operation result of the convolution operation of the operation unit, and the output is a result of the conversion-quantization process of converting the input value based on a predetermined condition and quantizing the converted value by reducing the bit precision of the converted data; and 处理单元,被配置为通过������所述表来获取所述转换-量化处理的与所述运算单元的运算结果相对应的输出。A processing unit configured to obtain an output of the conversion-quantization process corresponding to an operation result of the operation unit by referring to the table. 2.根据权利要求1所述的神经网络处理装置,其中,2. The neural network processing device according to claim 1, wherein, 所述表使通过将所述转换-量化处理的输入划分成多个连续区间而获得的多个输入区间与所述转换-量化处理的输出彼此相关联,并且the table associates a plurality of input intervals obtained by dividing the input of the conversion-quantization process into a plurality of continuous intervals and the output of the conversion-quantization process with each other, and 所述处理单元包括:The processing unit includes: 输入确定单元,被配置为将所述运算单元的所述卷积运算的运算结果与所述多个输入区间进行比较并确定包括所述运算结果的输入区间;以及an input determination unit configured to compare an operation result of the convolution operation of the operation unit with the plurality of input intervals and determine an input interval including the operation result; and 输出获取单元,被配置为通过参考所述表,根据所述输入确定单元的确定结果,来获取所述转换-量化处理的与所述输入区间相对应的输出。An output acquisition unit configured to acquire an output of the conversion-quantization process corresponding to the input interval according to the determination result of the input determination unit by referring to the table. 3.根据权利要求1所述的神经网络处理装置,其中,3. The neural network processing device according to claim 1, wherein, 所述表使针对所述转换-量化处理的输入预先设置的多个阈值与所述转换-量化处理的输出彼此相关联,并且the table associates a plurality of threshold values previously set for the input of the conversion-quantization process and the output of the conversion-quantization process with each other, and 所述处理单元包括:The processing unit includes: 第一阈值处理单元,被配置为将所述运算单元的所述卷积运算的运算结果与所述多个阈值进行比较并输出与所述运算结果相对应的阈值;以及a first threshold processing unit configured to compare an operation result of the convolution operation of the operation unit with the plurality of thresholds and output a threshold corresponding to the operation result; and 输出获取单元,被配置为通过参考所述表来获取所述转换-量化处理的与所述第一阈值处理单元所输出的阈值相对应的输出。An output acquisition unit configured to acquire an output of the conversion-quantization process corresponding to the threshold output by the first threshold processing unit by referring to the table. 4.根据权利要求2所述的神经网络处理装置,其中,4. The neural network processing device according to claim 2, wherein, 所述表使用于识别所述转换-量化处理的输出单调增加的输入区间和所述转换-量化处理的输出单调减少的输入区间的划分信息、针对所述转换-量化处理的输入预先设置的多个阈值、以及所述转换-量化处理的与所述多个阈值中的每个阈值相对应的输出彼此相关联,The table uses division information for identifying an input section in which the output of the conversion-quantization process monotonically increases and an input section in which the output of the conversion-quantization process monotonically decreases, and a number of preset values for the input of the conversion-quantization process. thresholds, and outputs of the transform-quantization process corresponding to each of the plurality of thresholds are associated with each other, 所述输入确定单元基于所述划分信息来确定所述运算单元的所述卷积运算的运算结果所属的所述转换-量化处理的输入区间,并且the input determination unit determines, based on the division information, an input section of the conversion-quantization process to which the operation result of the convolution operation of the operation unit belongs, and 所述处理单元包括:The processing unit includes: 第二阈值处理单元,被配置为在所述输入确定单元确定出的所述输入区间中,将所述运算单元的所述卷积运算的运算结果与所述多个阈值进行比较并输出与所述运算结果相对应的阈值;以及A second threshold processing unit configured to compare the operation result of the convolution operation of the operation unit with the plurality of thresholds in the input interval determined by the input determination unit, and output the result of the operation with the thresholds. the threshold corresponding to the result of the operation; and 输出获取单元,被配置为通过参考所述表来获取所述转换-量化处理的与所述第二阈值处理单元所输出的阈值相对应的输出。An output acquisition unit configured to acquire an output of the conversion-quantization process corresponding to the threshold value output by the second threshold value processing unit by referring to the table. 5.根据权利要求1至4中的任一项所述的神经网络处理装置,其中,5. The neural network processing device according to any one of claims 1 to 4, wherein, 所述神经网络是包括至少一个中间层的多层神经网络。The neural network is a multi-layer neural network including at least one intermediate layer. 6.根据权利要求1至5中的任一项所述的神经网络处理装置,其中,6. The neural network processing device according to any one of claims 1 to 5, wherein, 被包括在所述转换-量化处理中的基于所述预定条件来转换所述运算单元的所述卷积运算的运算结果的处理包括以下项中的至少一项:决定激活函数的运算结果;以及使所述运算结果标准化。The process of converting the operation result of the convolution operation of the operation unit based on the predetermined condition included in the conversion-quantization process includes at least one of the following items: determining the operation result of an activation function; and Normalize the result of the operation. 7.一种神经网络处理方法,包括:7. A neural network processing method, comprising: 第一步骤,将提供给神经网络的输入信号存储在第一存储器中;The first step is to store the input signal provided to the neural network in the first memory; 第二步骤,将所述神经网络的权重存储在第二存储器中;In the second step, the weight of the neural network is stored in the second memory; 第三步骤,执行所述神经网络的包括所述输入信号与所述权重的乘积-求和运算的卷积运算;The third step is to perform a convolution operation of the neural network that includes a product-sum operation of the input signal and the weight; 第四步骤,将表存储在第三存储器中,所述表被配置为使转换-量化处理的输入与输出彼此相关联,其中,所述输入是所述第三步骤中的所述卷积运算的运算结果,并且所述输出是基于预定条件来转换输入值和通过降低转换后的数据的比特精度来量化转换后的值的所述转换-量化处理的结果;以及a fourth step of storing a table in a third memory, the table being configured to associate the input and output of the transform-quantization process with each other, wherein the input is the convolution operation in the third step and the output is a result of the conversion-quantization process of converting the input value based on a predetermined condition and quantizing the converted value by reducing the bit precision of the converted data; and 第五步骤,通过参考所述表来获取所述转换-量化处理的与所述第三步骤中的运算结果相对应的输出。In the fifth step, the output of the conversion-quantization process corresponding to the operation result in the third step is obtained by referring to the table. 8.一种神经网络处理程序,被配置为使计算机执行:8. A neural network processing program configured to cause a computer to perform: 第一步骤,将提供给神经网络的输入信号存储在第一存储器中;The first step is to store the input signal provided to the neural network in the first memory; 第二步骤,将所述神经网络的权重存储在第二存储器中;In the second step, the weight of the neural network is stored in the second memory; 第三步骤,执行所述神经网络的包括所述输入信号与所述权重的乘积-求和运算的卷积运算;The third step is to perform a convolution operation of the neural network that includes a product-sum operation of the input signal and the weight; 第四步骤,将表存储在第三存储器中,所述表被配置为使转换-量化处理的输入与输出彼此相关联,其中,所述输入是所述第三步骤中的所述卷积运算的运算结果,并且所述输出是基于预定条件来转换输入值和通过降低转换后的数据的比特精度来量化转换后的值的所述转换-量化处理的结果;以及a fourth step of storing a table in a third memory, the table being configured to associate the input and output of the transform-quantization process with each other, wherein the input is the convolution operation in the third step and the output is a result of the conversion-quantization process of converting the input value based on a predetermined condition and quantizing the converted value by reducing the bit precision of the converted data; and 第五步骤,通过参考所述表来获取所述转换-量化处理的与所述第三步骤中的运算结果相对应的输出。In the fifth step, the output of the conversion-quantization process corresponding to the operation result in the third step is obtained by referring to the table.
CN201980066531.1A 2018-10-10 2019-09-10 Neural network processing device, neural network processing method, and neural network processing program Pending CN112930543A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2018192021 2018-10-10
JP2018-192021 2018-10-10
PCT/JP2019/035492 WO2020075433A1 (en) 2018-10-10 2019-09-10 Neural network processing device, neural network processing method, and neural network processing program

Publications (1)

Publication Number Publication Date
CN112930543A true CN112930543A (en) 2021-06-08

Family

ID=70163789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980066531.1A Pending CN112930543A (en) 2018-10-10 2019-09-10 Neural network processing device, neural network processing method, and neural network processing program

Country Status (4)

Country Link
US (1) US12430533B2 (en)
JP (2) JP6886747B2 (en)
CN (1) CN112930543A (en)
WO (1) WO2020075433A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330659A (en) * 2021-12-29 2022-04-12 辽宁工程技术大学 BP neural network parameter optimization method based on improved ASO algorithm
CN115512729A (en) * 2021-08-27 2022-12-23 台湾积体电路制造股份有限公司 Memory device and method of operation thereof

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102774094B1 (en) 2019-06-05 2025-03-04 삼성전자주식회사 Electronic apparatus and method of performing operations thereof
US11360822B2 (en) * 2019-09-12 2022-06-14 Bank Of America Corporation Intelligent resource allocation agent for cluster computing
US11562235B2 (en) * 2020-02-21 2023-01-24 International Business Machines Corporation Activation function computation for neural networks
KR102861538B1 (en) 2020-05-15 2025-09-18 삼성전자주식회사 Electronic apparatus and method for controlling thereof
JP7671754B2 (en) * 2020-06-30 2025-05-02 マクセル株式会社 Neural Network Generator
CN113485762B (en) * 2020-09-19 2024-07-26 广东高云半导体科技股份有限公司 Method and apparatus for improving system performance by offloading computing tasks using configurable devices

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006301894A (en) * 2005-04-20 2006-11-02 Nec Electronics Corp Multiprocessor system and message transfer method for multiprocessor system
JP2010134697A (en) * 2008-12-04 2010-06-17 Canon Inc Convolution operation circuit, hierarchical convolution operation circuit, and object recognition device
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
JP2017174039A (en) * 2016-03-23 2017-09-28 富士フイルム株式会社 Image classification device, method, and program
CN107636697A (en) * 2015-05-08 2018-01-26 高通股份有限公司 Fixed-point neural network based on floating-point neural network quantization
US20180032866A1 (en) * 2016-07-28 2018-02-01 Samsung Electronics Co., Ltd. Neural network method and apparatus
US20180046903A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
CN108230277A (en) * 2018-02-09 2018-06-29 中国人民解放军战略支援部队信息工程大学 A kind of dual intensity CT picture breakdown methods based on convolutional neural networks
CN108345939A (en) * 2017-01-25 2018-07-31 微软技术许可有限责任公司 Neural network based on fixed-point calculation
CN108364061A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Arithmetic unit, operation execute equipment and operation executes method

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2810202B2 (en) * 1990-04-25 1998-10-15 株式会社日立製作所 Information processing device using neural network
JPH06259585A (en) 1993-03-10 1994-09-16 Toyota Central Res & Dev Lab Inc Neural network device
JPH07248841A (en) 1994-03-09 1995-09-26 Mitsubishi Electric Corp Non-linear function generator and format converter
US5847952A (en) * 1996-06-28 1998-12-08 Honeywell Inc. Nonlinear-approximator-based automatic tuner
JP3438537B2 (en) * 1997-07-18 2003-08-18 株式会社デンソー Neural network computing device
US6389404B1 (en) * 1998-12-30 2002-05-14 Irvine Sensors Corporation Neural processing module with input architectures that make maximal use of a weighted synapse array
US7088860B2 (en) * 2001-03-28 2006-08-08 Canon Kabushiki Kaisha Dynamically reconfigurable signal processing circuit, pattern recognition apparatus, and image processing apparatus
JP2002342308A (en) * 2001-05-21 2002-11-29 Ricoh Co Ltd Arithmetic device, program, and function value calculation method
JP2006154992A (en) 2004-11-26 2006-06-15 Akita Prefecture Neuro-processor
JP4880316B2 (en) 2006-02-08 2012-02-22 株式会社エヌ・ティ・ティ・ドコモ Wireless communication apparatus and wireless communication method
JP5146186B2 (en) 2008-08-06 2013-02-20 三菱電機株式会社 Ultra-high sensitivity imaging device
US10262259B2 (en) 2015-05-08 2019-04-16 Qualcomm Incorporated Bit width selection for fixed point neural networks
JP6183980B1 (en) 2016-12-02 2017-08-23 国立大学法人東京工業大学 Neural network circuit device, neural network, neural network processing method, and neural network execution program
US10997492B2 (en) 2017-01-20 2021-05-04 Nvidia Corporation Automated methods for conversions to a lower precision data format
JP2018135069A (en) 2017-02-23 2018-08-30 パナソニックI���マネジメント株式会社 Information processing system, information processing method, and program
JP6823495B2 (en) 2017-02-27 2021-02-03 株式会社日立製作所 Information processing device and image recognition device
JP6936592B2 (en) 2017-03-03 2021-09-15 キヤノン株式会社 Arithmetic processing unit and its control method
US10740432B1 (en) * 2018-12-13 2020-08-11 Amazon Technologies, Inc. Hardware implementation of mathematical functions
US11475285B2 (en) * 2019-01-28 2022-10-18 Samsung Electronics Co., Ltd. Neural network accelerator and operating method thereof
CN113570033B (en) * 2021-06-18 2023-04-07 北京百度网讯科技有限公司 Neural network processing unit, neural network processing method and device
JP7801882B2 (en) * 2021-11-22 2026-01-19 ルネサスエレクトロニクス株式会社 Semiconductor Devices

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006301894A (en) * 2005-04-20 2006-11-02 Nec Electronics Corp Multiprocessor system and message transfer method for multiprocessor system
JP2010134697A (en) * 2008-12-04 2010-06-17 Canon Inc Convolution operation circuit, hierarchical convolution operation circuit, and object recognition device
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
CN107636697A (en) * 2015-05-08 2018-01-26 高通股份有限公司 Fixed-point neural network based on floating-point neural network quantization
JP2017174039A (en) * 2016-03-23 2017-09-28 富士フイルム株式会社 Image classification device, method, and program
US20180032866A1 (en) * 2016-07-28 2018-02-01 Samsung Electronics Co., Ltd. Neural network method and apparatus
US20180046903A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
CN108345939A (en) * 2017-01-25 2018-07-31 微软技术许可有限责任公司 Neural network based on fixed-point calculation
WO2018140294A1 (en) * 2017-01-25 2018-08-02 Microsoft Technology Licensing, Llc Neural network based on fixed-point operations
CN108230277A (en) * 2018-02-09 2018-06-29 中国人民解放军战略支援部队信息工程大学 A kind of dual intensity CT picture breakdown methods based on convolutional neural networks
CN108364061A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Arithmetic unit, operation execute equipment and operation executes method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIRAKA YURI, ISHIZAKI STIFFNESS: "The concept of a distributed neural network by Nakashima Yasuhiko", GENERAL INCORPORATED ELECTRONIC INFORMATION COMMUNICATION ENGINEERS, vol. 117, no. 40, 24 May 2017 (2017-05-24), pages 66 - 67 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512729A (en) * 2021-08-27 2022-12-23 台湾积体电路制造股份有限公司 Memory device and method of operation thereof
CN114330659A (en) * 2021-12-29 2022-04-12 辽宁工程技术大学 BP neural network parameter optimization method based on improved ASO algorithm

Also Published As

Publication number Publication date
US12430533B2 (en) 2025-09-30
JP7568198B2 (en) 2024-10-16
US20210232894A1 (en) 2021-07-29
JP6886747B2 (en) 2021-06-16
JP2021108230A (en) 2021-07-29
JPWO2020075433A1 (en) 2021-02-15
WO2020075433A1 (en) 2020-04-16

Similar Documents

Publication Publication Date Title
CN112930543A (en) Neural network processing device, neural network processing method, and neural network processing program
US20210263995A1 (en) Reduced dot product computation circuit
US11093168B2 (en) Processing of neural networks on electronic devices
US11961267B2 (en) Color conversion between color spaces using reduced dimension embeddings
CN112085175A (en) Data processing method and device based on neural network calculation
JP7040771B2 (en) Neural network processing equipment, communication equipment, neural network processing methods, and programs
CN113157987B (en) Data preprocessing method for machine learning algorithm and related equipment
CN118170347B (en) Precision conversion device, data processing method, processor, and electronic device
CN111160523A (en) Dynamic quantization method, system and medium based on characteristic value region
US12282842B2 (en) Neural network processing apparatus, neural network processing method, and neural network processing program
US12100196B2 (en) Method and machine learning system to perform quantization of neural network
US20220129736A1 (en) Mixed-precision quantization method for neural network
CN112668455B (en) Facial age recognition method, device, terminal equipment and storage medium
CN113822413A (en) Method and system for selecting digital format of deep neural network based on network sensitivity and quantization error
CN115099402A (en) Quantitative evaluation method and computing device
CN111385601A (en) Video auditing method and system
CN113808011A (en) Feature fusion based style migration method and device and related components thereof
CN118228776A (en) Data processing method, storage medium, electronic device and program product
CN114065913B (en) Model quantization method, device and terminal equipment
TWI837298B (en) Neural network-like processing device, neural network-like processing method and neural network-like processing program
TWI708196B (en) Method and processor for decompression of model parameters using functions based upon cumulative count distributions
CN113438482A (en) Region of interest based video coding
TW202131237A (en) Structure conversion device, structure conversion method, and structure conversion program
CN112102942A (en) Skeletal development grade detection method and terminal equipment
CN114819149B (en) Data processing method, device and medium based on transforming neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20250206

Address after: Kyoto Prefecture

Applicant after: MAXELL, Ltd.

Country or region after: Japan

Address before: Tokyo Capital of Japan

Applicant before: Lippmade Co.,Ltd.

Country or region before: Japan