CN111062475B

CN111062475B - Method and apparatus for quantizing parameters of a neural network

Info

Publication number: CN111062475B
Application number: CN201910822654.5A
Authority: CN
Inventors: 朴炫宣; 李俊行; 姜信行
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2018-10-17
Filing date: 2019-08-30
Publication date: 2025-05-02
Anticipated expiration: 2039-08-30
Also published as: KR102917770B1; CN111062475A; EP3640858B1; US12026611B2; US20200125947A1; JP2020064635A; KR20200043169A; EP3640858A1; JP7117280B2

Abstract

A method of quantizing parameters of a neural network includes: calculating a bit shift value for each parameter, the bit shift value indicating a degree outside a bit range of a fixed-point format used to quantize the parameter; updating the fixed-point format based on the calculated bit shift value of the parameter; and quantizing the parameters updated in a learning or inference process according to the updated fixed-point format.

Description

Method and device for quantifying parameters of a neural network

Cross Reference to Related Applications

The present application claims priority from korean patent application No. 10-2018-0123977, filed in the korean intellectual property office on 10 month 17 of 2018, the entire disclosure of which is incorporated herein by reference.

Technical Field

The following description relates to methods and apparatus for quantifying parameters of a neural network.

Background

Neural networks refer to computational architecture that models the brain of a living organism. With recent progress in neural network technology, a great deal of research has been conducted on analyzing input data and extracting effective information by using a neural network device in various electronic systems.

The neural network device performs a large number of operations on the input data. There is a need for a technique to efficiently handle such neural network operations.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In one general aspect, a method of quantizing parameters of a neural network includes, for each parameter, calculating a bit shift value indicating a degree outside a bit range of a fixed point format used to quantize the parameter, updating the fixed point format using the calculated bit shift value of the parameter, and quantizing the parameter updated in a learning or inference process according to the updated fixed point format.

Calculating the bit shift value may include detecting, for each parameter, a most significant bit having a value of "1", and determining, for each parameter, a difference in number of bits between the detected most significant bit and the most significant bit of the integer part of the fixed point format as the bit shift value.

Detecting the most significant bits may include searching for bits within a particular range based on the most significant bits of the integer part of the fixed point format for each parameter and detecting the most significant bits having a value of "1".

Updating the fixed-point format may include determining a number of occurrences of overflow and a maximum bit shift value according to the calculated bit shift value, and updating the fixed-point format based on the number of occurrences of overflow and the maximum bit shift value.

Updating the fixed-point format may include updating the fixed-point format by reducing a fractional length of the fixed-point format by a maximum bit shift value in the case that the number of occurrences of overflow is greater than a specific value.

The particular value may be based on the number of parameters.

The updated parameter may be a parameter updated during the t+1th learning or inference process, the parameter may be a parameter updated during the t-th learning or inference process, the fixed-point format may be a fixed-point format updated based on the parameter updated during the t-1 th learning or inference process, and t may be a natural number greater than or equal to 2.

Calculating the bit-shift value may include calculating a bit-shift value for each parameter during quantization of the parameter according to the fixed-point format.

The parameters may be weights or activations on the same layer in the neural network.

A computer-readable recording medium may store a program for causing a computer to execute the method.

In another general aspect, an apparatus for quantizing parameters of a neural network includes a memory storing at least one program, and a processor configured to calculate, for each parameter, a bit shift value indicating a degree outside a bit range of a fixed point format used to quantize the parameter, update the fixed point format using the calculated bit shift value of the parameter, and quantize the parameter updated in a learning or inference process according to the updated fixed point format by executing the at least one program.

The processor may detect a most significant bit having a value of "1" for each parameter, and determine, for each parameter, a difference in number of bits between the detected most significant bit and the most significant bit of the integer part of the fixed point format as the bit shift value.

The processor may search for bits within a particular range based on the most significant bits of the integer portion of the fixed point format and detect the most significant bits having a value of "1".

The processor may determine the number of occurrences of overflow and a maximum bit shift value from the calculated bit shift value, and update the fixed point format using the number of occurrences of overflow and the maximum bit shift value.

The processor may update the fixed point format by decreasing the fractional length of the fixed point format by a maximum bit shift value in case that the number of occurrences of overflow is greater than a predetermined value.

The processor may calculate a bit shift value for each parameter during quantization of the parameter according to the fixed point format.

In another general aspect, a method includes calculating, for each parameter updated in a t-th learning or inference process of a neural network, a bit shift value based on a fixed point format used to quantize the parameter, determining a number of occurrences of overflow and a maximum bit shift value from the calculated bit shift values, updating the fixed point format based on the number of occurrences of overflow and the maximum bit shift value, and quantizing the parameter in a t+1th learning or inference process of the neural network based on the updated fixed point format, and t is a natural number greater than or equal to 2.

Determining the number of occurrences of overflow may include determining whether a bit shift value of each parameter is greater than 0 and increasing the number of occurrences of overflow by 1 for each bit shift value greater than 0.

Determining the maximum bit-shift value may include comparing the calculated bit-shift values of the parameters updated in the t-th learning or inference process with each other and determining the maximum value of the bit-shift values updated in the t-th learning or inference process as the maximum bit-shift value.

In another general aspect, a computer-readable recording medium has one or more programs recorded thereon, the one or more programs including instructions for performing a method of quantifying parameters of a neural network.

Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.

Drawings

Fig. 1 is a diagram for explaining a neural network learning device and a neural network inference device according to an example.

Fig. 2 is a diagram illustrating an example of a neural network according to an example.

Fig. 3A is a diagram illustrating floating point format parameters according to an example, fig. 3B is a diagram illustrating fixed point format parameters according to an example, and fig. 3C is a diagram illustrating fixed point format parameters according to another example.

Fig. 4 is a block diagram of a hardware configuration of a neural network device according to an example.

FIG. 5 illustrates an example of a processor operating in a iterative learning or inference process.

Fig. 6 shows an example in which a processor uses bit-shifted values of parameters to update a fixed point format.

Fig. 7 shows an example of a bit shift value of a processor calculated parameter.

Fig. 8 shows an example of the processor detecting the most significant bit with "1" in the parameter.

Fig. 9 shows an example in which the processor determines the maximum bit shift value and the number of occurrences of overflow.

Fig. 10 is a diagram illustrating an algorithm for a processor to update a fractional length of a fixed point format according to an example.

Fig. 11 is a diagram illustrating an algorithm for a processor to update a fractional length of a fixed point format according to another example.

Fig. 12 is a block diagram of an electronic system according to an example.

Fig. 13 is a diagram for explaining a method of operating a neural network device according to an example.

Throughout the drawings and detailed description, unless otherwise described or provided, like reference numerals will be understood to refer to like elements, features and structures. The figures may not be drawn to scale and the relative sizes, proportions, and depictions of elements in the figures may be exaggerated for clarity, illustration, and convenience.

Detailed Description

Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the embodiments presented may take different forms and should not be construed as limited to the descriptions set forth herein. Accordingly, the embodiments are described below merely by referring to the drawings to explain various aspects. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. Expressions such as "at least one of the list of elements" modify the entire list of elements before it, rather than modifying individual elements in the list.

The terms "comprising" or "including," and the like, as used herein, should not be construed as necessarily including all of the various elements or operations described in the specification, and should exclude some of them or may be construed as including additional components or operations.

Furthermore, as used herein, various elements may be described using terms including ordinal numbers such as "first" or "second," but the elements should not be limited to the terms. These terms are only used to distinguish one element from another element.

Examples relate to a method and apparatus for processing parameters of a neural network, and detailed descriptions thereof will be omitted with respect to matters widely known to those of ordinary skill in the art.

Fig. 1 is a diagram for explaining a neural network learning device 10 and a neural network inference device 20 according to an example.

Referring to fig. 1, a neural network learning apparatus 10 may correspond to a computing device having various processing functions, such as generating a neural network, training or learning a neural network, quantizing a floating point format neural network to a fixed point format neural network, quantizing a fixed point format neural network to another fixed point format neural network, or retraining a neural network. For example, the neural network learning apparatus 10 may be implemented as various types of devices, such as a Personal Computer (PC), a server device, a mobile device, and the like. Meanwhile, parameter quantization may represent conversion of a floating-point format parameter to a fixed-point format parameter, or conversion of a fixed-point format parameter having a specific bit width to another fixed-point format parameter having another bit width.

The neural network learning device 10 can generate the trained neural network 11 by repeatedly training (learning) a given initial neural network. In this state, the initial neural network may have floating point format parameters, for example, parameters of 32-bit floating point precision, in terms of ensuring processing accuracy of the neural network. The parameters may include various types of data input to or output from the neural network, such as input/output activation of the neural network, weights, biases, and the like. As the neural network is trained repeatedly, floating point parameters of the neural network may be adjusted to output a more accurate output relative to a given input.

The neural network learning device 10 may process parameters according to a fixed-point format in the course of repeatedly learning (training) the initial neural network. Specifically, the neural network learning device 10 can process parameters according to an 8-bit or 16-bit fixed-point format to learn the neural network within an allowable accuracy loss while sufficiently reducing the number of operations. Thus, the neural network learning apparatus 10 may be implemented in a smart phone, a tablet computer, or a wearable device having relatively low processing power for device-side learning.

The neural network learning device 10 may send the trained neural network 11 to a hardware accelerator (e.g., the neural network inference device 20). The neural network inference means 20 may be included in a mobile device, an embedded device, or the like. The neural network inference means 20 is dedicated hardware for driving the quantized neural network 21. Since the neural network inference means 20 is implemented with relatively low power or low performance, the neural network inference means 20 may be implemented to be more suitable for fixed point operations than floating point operations. The neural network inference means 20 may correspond to, but is not limited to, a Tensor Processing Unit (TPU), a neural engine, etc., which is a dedicated module for driving the neural network.

The neural network inference means 20 for driving the quantized neural network 21 may be implemented in a separate device independent of the neural network learning means 10. However, the present disclosure is not limited thereto, and the neural network inference means 20 may be implemented in the same device as the neural network learning means 10.

The neural network inference apparatus 20 deploying the quantized neural network 21 may be included in, for example, an autonomous vehicle, a robot, a smart phone, a tablet device, an Augmented Reality (AR) device, an internet of things (IoT) device, or the like, which performs voice recognition, image recognition, or the like by using the neural network, but the present disclosure is not limited thereto.

Fig. 2 is a diagram showing an example of the neural network 2.

Referring to fig. 2, the neural network 2 may have a structure including an input layer, a hidden layer, and an output layer, may perform an operation based on received input data (e.g., I ₁ and I ₂), and may generate output data (e.g., O ₁ and O ₂) based on a result of the operation.

The neural network 2 may be a Deep Neural Network (DNN) or an n-layer neural network including one or more hidden layers. For example, as shown in fig. 2, the neural network 2 may be DNN including an input layer 1, two hidden layers (layer 2 and layer 3), and an output layer 4. The DNNs may include, but are not limited to, convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), deep belief networks, limited boltzmann machines, and the like.

Although the neural network 2 is shown as including four layers, this is merely exemplary, and the neural network 2 may include more or fewer layers or more or fewer nodes. Further, the neural network 2 may include layers having various structures different from the structure shown in fig. 2. For example, the neural network 2 may include a convolutional layer, a pooling layer, and a fully-connected layer as DNNs.

Each layer included in the neural network 2 may include a plurality of artificial nodes called neurons, processing units (PEs), units, or similar terms. For example, as shown in fig. 2, layer 1 may include two nodes and layer 2 may include three nodes. However, this is merely exemplary, and each layer included in the neural network 2 may include a variety of numbers of nodes.

Nodes included in the various layers of the neural network 2 may be connected to each other in order to process data. For example, one node may receive data from other nodes and process the data, and may output the operation result to the other nodes.

The output value of each node may be referred to as active. The activation may be the output of one node and is the input value of the node included in the next layer. Each node may determine its own activation based on the activation received from the node included in the previous layer and the weight. The weight is a parameter for calculating activation in each node, and may be a value assigned to a connection relationship between nodes.

Each node may be processed by a computing unit that receives input and output activations, and may map the input and output. At sigma, the activation function is represented,Represents a weight from a kth node included in the (i-1) th layer to a jth node included in the i-th layer,Representing the offset sum of the jth node included in the ith layerWhen representing activation of the j-th node of the i-th layer, the activation may be calculated by using the following equation 1

Equation 1

As shown in fig. 2, the activation of the first node in the second layer 2 may be expressed asIn addition, according to equation 1,May have a value ofHowever, the above equation 1 is merely an example for describing activation, weight, and bias for processing data in the neural network 2, and the present disclosure is not limited thereto. The activation may be a value obtained by allowing a value obtained by applying an activation function to a weighted sum of activations received from a previous layer to pass through a modified linear unit (ReLU).

As described above, in the neural network 2, a large number of data sets are exchanged between a plurality of interconnected nodes and these data sets undergo many calculation processes in each layer.

Fig. 3A is a diagram illustrating floating point format parameters 30 according to an example. Fig. 3B is a diagram illustrating fixed point format parameters 35 according to an example. Fig. 3C is a diagram illustrating fixed point format parameters according to another example.

Referring to FIG. 3A, floating point format parameter 30 may include sign bit 310, exponent portion 320, mantissa portion 330, and offset 340. Floating point indications divide a number into a portion that indicates a significant digit portion (i.e., mantissa) and a portion that indicates a decimal point location.

Mantissa portion 330 may correspond to a portion that indicates a significant digit portion. The index portion 320 may correspond to a portion indicating the position of the decimal point. Sign bit 310 may determine the sign of floating point format parameter 30. The bias 340 may be a value added to the exponent portion 320 or a value subtracted from the exponent portion 320 and determined to be a value representing a negative exponent. Floating point format parameter 30 may include sign bits 310, bits corresponding to exponent portion 320, and bits corresponding to mantissa portion 330. The offset 340 may be predetermined with respect to the floating point format parameters 30 and stored separately.

When sign bit 310 represents a sign, exponent portion 320 represents an exponent, mantissa portion 330 represents a mantissa, and offset 340 represents an offset, floating point format parameter 30 may be a value according to equation 2 below.

Equation 2

Floating point value=(-1)^sign·2^{exponent-bias}·mantissa

Referring to fig. 3B, the fixed point format parameter 35 may include a sign bit 315, an integer portion 325, a fractional portion 335, and a fractional point 345. The fixed point indicates a sign that represents a decimal of a fixed number of bits using a decimal point.

Sign bit 315 may determine the sign of fixed point format parameter 35. Integer portion 325 may correspond to a portion of an integer representing fixed point format parameter 35. The fractional portion 335 may correspond to a portion representing the fractional portion of the fixed point format parameter 35. The decimal point 345 may indicate a point that is a reference for distinguishing the integer portion 325 and the decimal portion 335 of the fixed point format parameter 35.

The values represented by the fixed point format parameters 35 may be described with reference to fig. 3C. Referring to fig. 3C, the fixed-point format parameter 35 may be an 8-bit fixed-point value. The fixed point format parameter 35 may also include an integer portion 325, a fractional portion 335, and a fractional point 345.

Each bit representing the integer portion 325 and the fractional portion 335 may have a value of 1 or 0. As shown in fig. 3C, bits representing the integer portion 325 and the fractional portion 335 may have values of-8, +4, +2, +1, +0.5, +0.25, +0.125, and +0.0625 sequentially. When the most significant bit of the integer part 325 is 1, since the value represented by the most significant bit of the integer part 325 is-8, the value represented by the fixed point format parameter 35 may be negative even if the other bits included in the integer part 325 and the fractional part 335 have specific values. The most significant bits of integer portion 325 may correspond to symbol bits 315 of the symbol that determines fixed-point format parameter 35.

However, fig. 3C is only an example, and the fixed-point format parameter 35 may be a 16-bit fixed-point value, and may be a fixed-point value having any suitable number of bits. In addition, the fixed-point format parameter 35 may represent a negative number by any one of the encoding representation method, the 1-complement method, and the 2-complement method.

Fig. 4 is a block diagram of a hardware configuration of the neural network device 100 according to an example.

The neural network device 100 may operate by being included in at least one of the neural network learning device 10 and the neural network inference device 20, or may operate as a separate third hardware accelerator.

Referring to fig. 4, the neural network device 100 may include a processor 110 and a memory 120. In fig. 4, only the constituent elements of the neural network device 100 of the present example are shown. It will thus be apparent to those of ordinary skill in the art that the neural network device 100 may also include general constituent elements in addition to those illustrated in fig. 4.

The processor 110 performs all functions to control the neural network device 100. For example, the processor 110 controls all functions of the neural network device 100 by executing programs stored in the memory 120 in the neural network device 100. The processor 110 may be implemented by a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Processor (AP), or the like provided in the neural network device 100. However, the present disclosure is not limited thereto.

The memory 120 is hardware for storing pieces of data handled in the neural network device 100. For example, the memory 120 may store data processed in the neural network device 100 or to be processed in the neural network device 100. Further, the memory 120 may store applications, drivers, and the like to be driven by the neural network device 100. The memory 120 may be a (dynamic random access memory) DRAM, but the present disclosure is not limited thereto. The memory 120 may include at least one of volatile memory or nonvolatile memory.

The processor 110 may generate a trained neural network by iteratively learning the initial neural network. The processor 110 may repeatedly update the parameters of the neural network by repeatedly learning the neural network. For example, the weights in the neural network may be updated repeatedly during the learning process, and the activation may also be updated repeatedly due to the operation using the weights. Each time the parameters of the neural network are repeatedly updated, the processor 110 may quantize the updated parameters in a fixed point format. Further, the processor 110 may repeatedly update the fixed point format each time the parameters of the neural network are repeatedly updated. When quantizing variable parameters with a fixed point format, the processor 110 may update the fixed point format to quantize the parameters, reducing accuracy loss while reducing the number of operations, as accuracy loss may occur during the learning process. In particular, the processor 110 may update the fixed point format to correspond to the distribution of fixed point values of the updated parameters. For example, the processor 110 may update the decimal point location in the fixed point format to correspond to the parameter having the greatest value among the updated parameters.

The processor 110 may repeatedly update the parameters of the neural network, even during the inference process that drives the learned neural network to obtain the result value. For example, during the inference process, data may be repeatedly input to the learned neural network, and thus the activation in the learned neural network may be repeatedly updated. Thus, similar to the learning process, the processor 110 may quantify the updated parameters according to a fixed point format each time the parameters of the neural network are repeatedly updated, even during the inference process. Further, similar to the learning process, the processor 110 may repeatedly update the fixed point format each time the parameters of the neural network are repeatedly updated, even during the inference process.

The processor 110 may update the parameters during the t-th learning or inference process and may then quantify the updated parameters. At this point, the processor 110 may quantify the updated parameters of the t-th learning or inference process according to a particular first setpoint format. For example, the first setpoint format may be an updated setpoint format based on parameters updated during the t-1 th learning or inference process. The processor 110 may update the existing first fixed point format to the second fixed point format based on the parameters updated during the t-th learning or inference. The processor 110 may then quantify the parameters updated during the t+1st learning or inference process according to the second fixed point format.

Thus, when quantizing parameters updated during the t-th learning or inference process, the processor 110 may quantize the parameters according to a particular fixed point format, thereby reducing the time to scan all parameters updated during the t-th learning or inference process and the hardware overhead for determining the fixed point format.

Fig. 5 illustrates an example of the operation of the processor 110 in a iterative learning or inference process.

The processor 110 may quantize the parameter _t updated during the t-th learning or inference process according to a fixed point format having a fractional length frac length _t-1 to generate a quantized parameter q_parameter _t. In other words, the processor 110 may pre-determine the fixed-point format having the fractional length frac_length _t-1 based on the parameter _t-1 updated during the t-1 th learning or inference process, and quantize the parameter _t according to the fixed-point format having the fractional length frac_length _t-1 during the t-1 th learning or inference process.

The processor 110 may update the existing fractional length frac_length _t-1 to a fractional length frac_length _t based on the parameter _t. In other words, the processor 110 may predetermine the fractional length frac_length _t for quantifying the parameter _t+1 updated during the t+1st learning or inference process.

The processor 110 may update the parameter _t to the parameter _t+1 during the t+1st learning or inference process. In addition, the processor 110 may quantize the parameter _t+1 according to a fixed point format having a fractional length frac_length _t to generate a quantized parameter q_parameter _t+1.

In fig. 5, according to an example, parameters parameter _t-1、parameter_t and parameter _t+1 may be data tensors, which are, for example, weights or input/output activations on the same layer in a neural network, and according to another example, weights or input/output activations on the same channel in the neural network. Further, t may be a natural number of 1 or more, and when t is 1, a fixed-point format for quantizing the parameters updated in the first learning or inference process may be preset by the user.

Referring again to fig. 4, the processor 110 may calculate a bit shift value for each parameter, the bit shift value indicating a degree outside a bit range of a fixed point format used to quantize the parameter. The bit shift value may be a value indicating the extent to which the bit range representing the parameter value is outside the bit range that may be covered by the fixed point format. According to an example, in case the fixed point format may cover up to 3 bit integer parts, the bit shift value may be 3 when up to 6 bit number integer parts are needed to represent parameter values having the fixed point format. As another example, in the case where the fixed point format may cover up to 4-bit fractional parts, the bit shift value may be-2 when up to 6-bit fractional parts are required to represent parameter values having the fixed point format. The bit shift value also indicates the degree to which overflow or underflow occurs when quantizing parameters having a fixed point format. For example, when the bit shift value is 3, it can be seen that 3-bit overflow occurs, and when the bit shift value is-2, it can be seen that 2-bit underflow occurs. Thus, the processor 110 may calculate the bit-shift value of the parameter to determine the total number of occurrences of overflow or underflow associated with the parameter. Further, the processor 110 may store the bit shift value, the parameter, the number of occurrences of overflow, and the number of occurrences of underflow in the memory 120.

Processor 110 may update the fixed point format using the bit-shifted values of the parameters. According to an example, when the number of occurrences of overflow is greater than a predetermined value, the processor 110 may update the fixed point format by reducing the length of the fractional part in the fixed point format by a maximum bit shift value of the bit shift values. According to another example, when the number of occurrences of overflow is greater than a predetermined value, the processor 110 may update the fixed point format using a maximum bit shift value and a minimum bit shift value among the bit shift values. Further, the processor 110 may store information regarding the updated fixed-point format in the memory 120.

The processor 110 may then quantify the parameters updated during the learning or inference process according to the updated fixed-point format. In particular, the processor 110 may quantify the parameter updated during the t+1st learning or inference process according to a fixed point format updated based on the bit shift value of the parameter updated during the t-th learning or inference process.

Accordingly, the processor 110 can update the fixed-point format using the maximum bit shift value and the number of occurrences of overflow of the parameter, thereby updating the fixed-point format through relatively simple operations with less operations, and thus hardware overhead can be reduced.

Fig. 6 shows an example in which the processor 110 uses the bit-shifted values of the parameters to update the fixed point format.

The processor 110 may calculate a bit shift value for each of the parameters parameter _t (1) through parameter _t (i) (where i is a natural number greater than or equal to 2) updated during the t-th learning or inference process. Specifically, the processor 110 may calculate a bit shift value of each of the parameters parameter _t (1) to parameter _t (i) (where i is a natural number greater than or equal to 2) with respect to a bit range of a fixed-point format having a predetermined fractional length frac_length _t-1.

Fig. 7 shows an example in which the processor 110 calculates a bit shift value of the i-th parameter _t (i).

The processor 110 may calculate a bit shift value of the i-th parameter _t (i) among the parameters updated in the t-th learning or inference process.

The processor 110 may detect the most significant bit in the i-th parameter _t (i) having a value of "1". Referring to fig. 7, the processor 110 may detect the sixth bit of the integer part in the i-th parameter _t (i) as the most significant bit.

Subsequently, the processor 110 may determine a difference in bit number between the most significant bit of the integer part having the fractional length frac_length _t-1 of the ith parameter _t (i) and the most significant bit of the previously detected ith parameter _t (i) as a bit-shift value of the ith parameter _t (i). Referring to fig. 7, since the third bit of the integer part of the fixed point format having the fractional length frac_length _t-1 is the most significant bit, the difference in the number of bits between the sixth bit and the third bit is 3, and thus the processor 110 may calculate the bit shift value of the i-th parameter _t (i) as 3. A bit shift detector for detecting a difference in the number of bits may be included in the processor 110. In addition, the processor 110 may determine that a 3-bit overflow has occurred with respect to the i-th parameter _t (i).

The processor 110 may quantize the i-th parameter _t (i) according to a fixed point format having a fractional length frac_length _t-1 to generate a quantized parameter q_parameter _t (i) to calculate a bit-shift value of the i-th parameter _t (i) during quantization of the i-th parameter _t (i). In other words, the processor 110 may perform a process of calculating the bit shift value of the i-th parameter _t (i) together with a process of quantizing the i-th parameter _t (i).

In fig. 7, the i-th parameter _t (i) updated in the t-th learning or inference process is shown as a 48-bit fixed-point value, and the quantized parameter q_parameter _t (i) is shown as a fixed-point value having a bit width of 8 and a fractional length of 4, but is not limited thereto.

Fig. 8 shows an example in which the processor 110 detects the most significant bit having "1" in the parameter _t (i).

The processor 110 may sequentially read the bit values in the bit-down direction starting from the most significant bit in the parameter _t (i) to detect the most significant bit with a "1" in the parameter _t (i).

According to another example, the processor 110 may read bits within a predetermined range relative to predetermined bits in the parameter _t (i) to detect the most significant bit in the parameter _t (i) that has a "1". For example, as shown in fig. 8, the processor 110 may read bits in the 8-bit or 16-bit range related to the most significant bit of the integer part of the fixed-point format having the fractional length frac_length _t-1 to detect the most significant bit having "1" in the parameter _t (i). Thus, the processor 110 may read bits within a predetermined range, instead of all bits in the parameter _t (i), thereby reducing the hardware overhead for scanning.

Referring again to fig. 6, the processor 110 may determine a maximum bit shift value and the occurrence number of overflows according to the bit shift values of the parameters parameter _t (1) to parameter _t (i). Specifically, the processor 110 may determine the maximum value of the bit shift values as the maximum bit shift value and the number of bit shift values corresponding to the positive number of the bit shift values as the occurrence frequency of overflow.

Processor 110 may then update the fixed point format based on the maximum bit shift value and the number of occurrences of overflow. In other words, the processor 110 may update the fixed point format having the fractional length frac_length _t-1 to the fixed point format having the fractional length frac_length _t based on the maximum bit shift value and the number of occurrences of overflow. Specifically, when the number of occurrences of overflow is greater than a specific value, the processor 110 may decrease the fractional length frac_length _t-1 by the maximum bit-shift value and update the fractional length from the existing fractional length frac_length _t-1 to a new fractional length frac_length _t.

Thus, the processor 110 may quantify the parameters updated during the t+1st learning or inference process according to a fixed-point format having a fractional length frac_length _t.

Fig. 9 shows an example in which the processor 110 determines the maximum bit shift value and the number of occurrences of overflow.

Similar to the logical operation shown in fig. 9, the processor 110 may determine the maximum bit shift value and the number of occurrences of overflow by performing a logical operation on a total of N parameters, which are parameters updated in the t-th learning or inference process. The processor 110 may include a logic operator for performing the logic operations shown in fig. 9.

In operation s910, the processor 110 may determine whether the bit shift value bit_shift (i) of the i-th parameter among the parameters is greater than 0. In other words, the processor 110 may determine whether the ith parameter is a parameter corresponding to overflow. When the bit shift value bit_shift (i) of the i-th parameter is greater than 0, the processor 110 may increase the number of occurrences of overflow by 1 (s 920). Thereafter, the processor 110 may determine whether the bit shift value bit_shift (i+1) of the i+1th parameter is greater than 0 to determine the number of occurrences of overflow. As a result, the processor 110 may sequentially determine whether each bit shift value of the N parameters updated during the t-th learning or inference process is greater than 0, and determine the total number of occurrences of overflow during the t-th learning or inference process.

In operation s930, the processor 110 may compare the bit shift value bit_shift (i) of the i-th parameter with the existing maximum bit shift value max_bit_shift _t. When the bit shift value bit_shift (i) is greater than the existing maximum bit shift value max_bit_shift _t, the processor 110 may update the bit shift value bit_shift (i) to the maximum bit shift value max_bit_shift _t (s 940). The processor 110 may then determine whether to update the maximum bit shift value max_bit_shift _t by comparing the bit shift value bit_shift (i+1) of the i+1th parameter with the updated maximum bit shift value max_bit_shift _t. As a result, the processor 110 may compare the bit shift values of the N parameters updated in the t-th learning or inference process with each other, thereby determining the maximum value of the bit shift values as the maximum bit shift value max_bit_shift _t.

The processor 110 may determine a minimum bit shift value corresponding to an underflow among the bit shift values of the N parameters. In particular, the processor 110 may compare the bit shift values of the N parameters with each other to determine a minimum bit shift value, which is the minimum value of the bit shift values having a value less than or equal to 0.

Fig. 10 is a diagram illustrating an algorithm for the processor 110 to update the fractional length frac_length _t-1 of the fixed-point format according to an example.

In operation s1010, the processor 110 may obtain a maximum bit shift value max_bit_shift _t, a total number of occurrences of overflow #of overflow _t, a fractional length frac_length _t-1 of a fixed-point format for quantizing parameters updated in a t-TH learning or inference process, a total number N of parameters updated in a t-TH learning or inference process, and an outlier data rate TH. The processor 110 may determine the maximum bit shift value max_bit_shift _t, the total number of occurrences of overflow #of overflow _t by the logical operation shown in fig. 9. Further, the user may set the outlier data rate TH to any number between 0 and 1.

In operation s1020, when the total number of occurrences of overflow #of flow _t is greater than Nx TH, the processor 110 may update the fractional length frac_length _t-1 through equation 1030. In equation 1030, a represents a specific weight. For example, when a is 1, the processor 110 may perform an operation of subtracting the maximum bit shift value max_bit_shift _t from the fractional length frac_length _t-1 to update the fractional length frac_length _t-1. In other words, the processor 110 may perform an operation of subtracting the maximum bit shift value max_bit_shift _t from the fractional length frac_length _t-1 to update the fractional length from the existing fractional length frac_length _t-1 to a new fractional length frac_length _t. Thus, the processor 110 may determine the fractional length frac_length _t of the fixed-point format used to quantify the parameters updated during the t+1st learning or inference process.

When the overflow total #of overflow _t is not greater than Nx TH, the processor 110 may not update and maintain the fractional length frac_length _t-1. In other words, the processor 110 may determine the fractional length frac_length _t-1 for quantizing the parameters updated in the t-th learning or inference process as the fractional length frac_length _t for quantizing the parameters updated in the t+1th learning or inference process. As a result, the processor 110 may set the outlier data rate TH to a value between 0 and 1, and thus may not update the fractional length frac_length _t-1 again when determining that the total number of overflows #of overflow _t is relatively small compared to the total number of parameters N.

Fig. 11 is a diagram showing an algorithm for the processor 110 to update the fractional length frac_length _t-1 of the fixed-point format according to another example.

Operations s1110 and s1120 of fig. 11 may correspond to operations s1010 and s1020 of fig. 10, and thus redundant description thereof will be omitted.

In operation s1110, the processor 110 may additionally obtain a minimum bit shift value min_bit_shift _t in addition to the signal obtained in operation s1010 of fig. 10. In other words, the processor 110 may obtain a minimum bit shift value min_bit_shift _t corresponding to an underflow in the bit shift values of the parameters updated in the t-th learning or inference process.

In operation s1120, when the overflow total #of overflow _t is greater than NXTH, the processor 110 may update the fractional length frac_length _t-1 through equation 1130. In equation 1130, a and b are values representing specific weights, and the symbol [ x ] is a gaussian function symbol representing the maximum integer not exceeding x. For example, the user may set a greater weight to a than to b to determine the fractional length frac_length _t.

Fig. 12 is a block diagram of an electronic system 1200 according to an example.

Referring to fig. 12, the electronic system 1200 may extract effective information by analyzing input data in real time based on a neural network, determine a condition based on the extracted information, or control elements of an electronic device on which the electronic system 1200 is mounted. For example, the electronic system 1200 may be applied to robotic devices (e.g., unmanned aerial vehicles, advanced Driving Assistance Systems (ADAS), etc.), smart TVs, smart phones, medical devices, mobile devices, image display devices, measurement devices, ioT devices, etc., and may also be mounted on at least one of various types of electronic devices.

Electronic system 1200 can include processor 1210, RAM 1220, neural network device 1230, memory 1240, sensor module 1250, and communication (TX/RX) module 1260. The electronic system 1200 may also include input/output modules, security modules, power control devices, and the like. Some hardware components of electronic system 1200 may be mounted on at least one semiconductor chip. The neural network device 1230 may include the neural network apparatus 100 described above or a neural network-specific hardware accelerator or apparatus including the same.

Processor 1210 controls all operations of electronic system 1200. Processor 1210 may include one processor core (single core) or multiple processor cores (multi-core). Processor 1210 may process or execute programs and/or data stored in memory 1240. The processor 1210 may control the functions of the neural network device 1230 by executing programs stored in the memory 1240. Processor 1210 may be implemented by CPU, GPU, AP or the like.

RAM 1220 may temporarily store programs, data, or instructions. For example, programs and/or data stored in the memory 1240 may be temporarily stored in the RAM 1220 according to a control of the processor 1210 or the boot code. The RAM 1220 may be implemented by a memory such as a Dynamic RAM (DRAM) or a Static RAM (SRAM), etc.

The neural network device 1230 may perform an operation of the neural network based on the received input data, and generate an information signal based on a result of the operation. The neural network may include, but is not limited to, CNN, RNN, deep belief network, boltzman-machine (Boltzman) limited, and the like. The neural network device 1230 is hardware that drives the above neural network for classification, and may correspond to a neural network-specific hardware accelerator.

The information signal may comprise one of various types of recognition signals, such as a speech recognition signal, an object recognition signal, an image recognition signal, a biometric information recognition signal, etc. For example, the neural network device 1230 may receive frame data included in a video stream as input data, and generate an identification signal for an object included in an image indicated by the frame data according to the frame data. However, the present disclosure is not limited thereto, and the neural network device 1230 may receive various types of input data according to the type or function of the electronic device on which the electronic system 1200 is mounted, and generate an identification signal according to the input data.

The memory 1240 is a storage device for storing data, such as an Operating System (OS), various programs, and various data. In an embodiment, the memory 1240 may store intermediate results generated during the execution of the operations of the neural network device 1230, such as an output feature map, as an output feature list or an external feature matrix. In an embodiment, the memory 1240 may store the compressed output profile. Further, the memory 1240 may store quantized neural network data, such as parameters, weight maps, or weight lists, for use by the neural network device 1230.

The memory 1240 may be a DRAM, but the present disclosure is not limited thereto. The memory 1240 may include at least one of a volatile memory and a nonvolatile memory. The non-volatile memory may include ROM, PROM, EPROM, EEPROM, flash memory, PRAM, MRAM, RRAM, FRAM, and the like. Volatile memory can include DRAM, SRAM, SDRAM, PRAM, MRAM, RRAM, feRAM or the like. In an embodiment, the memory 1240 may include at least one of HDD, SSD, CF, SD, micro SD, mini SD, xD, and memory stick.

The sensor module 1250 may collect information regarding the periphery of the electronic device on which the electronic system 1200 is installed. The sensor module 1250 may sense or receive signals from the outside of the electronic device, such as image signals, voice signals, magnetic signals, biometric signals, touch signals, etc., and convert the sensed or received signals into data. To this end, the sensor module 1250 may include at least one of various types of sensing devices, such as a microphone, an imaging device, an image sensor, a light detection and ranging (LIDAR) sensor, an ultrasonic sensor, an infrared sensor, a biological sensor, a touch sensor, and the like.

The sensor module 1250 may provide the converted data to the neural network device 1230 as input data. For example, the sensor module 1250 may include an image sensor, and may generate a video stream by photographing an external environment of the electronic device and provide consecutive data frames of the video stream to the neural network device 1230 in the order of input data. However, the present disclosure is not limited thereto, and the sensor module 1250 may provide various types of data to the neural network device 1230.

The communication module 1260 may include various wired or wireless interfaces capable of communicating with external devices. For example, the communication module 1260 may include a Local Area Network (LAN), a Wireless Local Area Network (WLAN) such as wireless fidelity (Wi-Fi), a Wireless Personal Area Network (WPAN) such as bluetooth, a wireless Universal Serial Bus (USB), zigBee, near Field Communication (NFC), radio Frequency Identification (RFID), power Line Communication (PLC), or a communication interface capable of connecting to a mobile cellular network (e.g., 3 rd generation (3G), 4 th generation (4G), long Term Evolution (LTE), etc.).

Fig. 13 is a diagram for explaining a method of operating the neural network device 100 according to the embodiment.

The method shown in fig. 13 may be performed by the elements of the neural network device 100 of fig. 4 or the electronic system 1200 of fig. 12, and redundant explanation is omitted.

In operation s1310, the neural network device 100 may calculate a bit shift value indicating a degree outside a bit range of a fixed point format for quantizing the parameters, for each parameter. Specifically, the neural network device 100 may calculate a bit shift value of each parameter updated in the t-th learning or inference process with respect to a bit range of a predetermined fixed point format based on the parameters updated in the t-1 th learning or inference process. The neural network device 100 may calculate a bit shift value of the parameter to determine a total number of occurrences of overflow or underflow of the parameter.

The neural network device 100 may detect the most significant bit having a value of "1" for each parameter, and calculate a difference in the number of bits between the detected most significant bit and the most significant bit of the integer part of the fixed point format as a bit shift value for each parameter. Further, for each parameter, the neural network device 100 may search for bits in a predetermined range based on the most significant bit of the integer part of the fixed-point format, and detect the most significant bit having a value of "1".

In operation s1320, the neural network device 100 may update the fixed point format using the bit shift value of the parameter. The neural network device 100 may update the existing fixed-point format used to quantify the parameters in the t-th learning or inference process to a new fixed-point format. For example, when the number of occurrences of overflow is greater than a predetermined value, the neural network device 100 may update the fixed-point format by reducing the fractional length of the fixed-point format by the maximum bit shift value of the bit shift values.

In operation s1330, the neural network device 100 may quantify parameters updated in the learning or inference process according to the updated fixed-point format. Specifically, the neural network device 100 may quantize the parameter updated in the t+1st learning or inference process according to a fixed point format updated by the bit shift value of the parameter updated in the t-th learning or inference process.

According to an embodiment, the neural network device may update a fixed point format for quantifying the parameters updated in the learning or inference process according to the change of the updated parameters, thereby reducing the amount of computation and reducing the accuracy loss. In addition, when quantizing the parameters updated in the t-th learning or inference process, the neural network device may quantize the parameters according to a predetermined fixed-point format, thereby reducing the time to scan all the parameters updated in the t-th learning or inference process and the hardware overhead for determining the fixed-point format. In addition, since the neural network device updates the fixed-point format using the number of occurrences of overflow and the maximum bit shift value of the parameter, the neural network device can update the fixed-point format through relatively simple and small-scale calculation, thereby reducing hardware overhead.

The apparatus described herein may include a processor, memory (persistent storage such as a disk drive) for storing program data to be executed by the processor, a communication port for handling communications with external devices, and a user interface device (including a display, keys, etc.). When software modules are involved, they may be stored as program instructions or computer readable code executable by a processor on non-transitory computer readable media such as read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. The medium may be readable by a computer, may be stored in a memory, and may be executed by a processor.

Embodiments may be described in the context of functional block components and various processing steps. Such functional blocks may be implemented by any number of hardware and/or software components configured to perform the specified functions. In embodiments, the present disclosure may employ various integrated circuit components, e.g., memory elements, processing elements, logic elements, look-up tables, etc., which may perform various functions under the control of one or more microprocessors or other control devices. Similarly, where elements of the disclosure are implemented using software programming or software elements, the disclosure may be implemented using any programming or scripting language (e.g., C, C ++, java, assembly language, etc.) as well as various algorithms that are implemented using any combination of data structures, objects, processes, routines, or other programming elements. The functional aspects may be implemented as algorithms executing on one or more processors. Further, the present disclosure may employ any number of techniques depending on the relevant fields for electronic configuration, signal processing and/or control, data processing, and the like. The terms "mechanism" and "element" are used broadly and are not limited to mechanical or physical embodiments, but may include software routines in combination with a processor or the like.

The particular implementations shown and described herein are illustrative examples of the present disclosure and are not intended to otherwise limit the scope of the present disclosure in any way. For brevity, electronic, control systems, software development and other functional aspects of the systems (and components of the individual operating components of the systems) according to the related art may not be described in detail. Furthermore, the connecting lines or connectors shown in the various figures presented are intended to represent exemplary functional relationships and/or physical or logical couplings between the various elements. It should be noted that there may be many alternative or additional functional relationships, physical connections or logical connections in a practical device.

The use of the terms "a" and "an" and "the" and similar referents in the context of describing the disclosure (especially in the context of the following claims) are to be construed to cover both the singular and the plural. Furthermore, recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. Finally, the steps of all methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as (e.g)") provided herein, is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. Various modifications and adaptations may become apparent to those skilled in the relevant art without departing from the spirit and scope of the disclosure.

It is to be understood that the embodiments described herein should be considered in descriptive sense only and not for purposes of limitation. The description of features or aspects in each embodiment should generally be taken to be applicable to other similar features or aspects in other embodiments.

Although one or more embodiments have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.

While this disclosure includes particular examples, it will be obvious to those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the claims and their equivalents. The examples described herein should be considered as illustrative only and not for the purpose of limitation. The descriptions of features or aspects in each example should be considered as applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques were performed in a different order and/or if components in the described systems, architectures, devices or circuits were combined in a different manner and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the present disclosure is defined not by the detailed description but by the claims and their equivalents, and all changes within the scope of the claims and their equivalents should be construed as being included in the present disclosure.

Claims

1. A processor-implemented method of quantifying parameters of a neural network for performing speech recognition and/or image recognition, the method comprising:

for each parameter, calculating a bit shift value indicating a degree outside a bit range of a fixed point format used to quantize the parameter;

updating the fixed point format based on the calculated bit shift value of the parameter, and

The parameters updated during the learning or inference process are quantized according to the updated fixed-point format,

Wherein calculating the bit shift value comprises:

For each parameter, detecting the most significant bit having a value of "1", and

For each parameter, a difference in number of bits between the detected most significant bit and the most significant bit of the integer part of the fixed point format is determined as the bit shift value.

2. The method of claim 1, wherein detecting the most significant bit comprises:

for each parameter, bits within a predetermined range are searched based on the most significant bits of the integer part of the fixed point format, and the most significant bits having a value of "1" are detected.

3. The method of claim 1, wherein updating the fixed point format comprises:

Determining the number of occurrences of overflow and a maximum bit shift value based on the calculated bit shift value, and

The fixed point format is updated based on the number of occurrences of the overflow and the maximum bit shift value.

4. The method of claim 3, wherein updating the fixed point format comprises:

In case that the number of occurrences of overflow is greater than a predetermined value, the fixed point format is updated by decreasing the fractional length of the fixed point format by the maximum bit shift value.

5. The method of claim 4, wherein the predetermined value is based on a number of parameters.

6. The method according to claim 1,

Wherein the updated parameter is the parameter updated during the t+1st learning or inference,

Wherein the parameter is a parameter updated during a t-th learning or inference process,

Wherein the fixed-point format is a fixed-point format updated based on parameters updated in the t-1 th learning or inference process, and

T is a natural number greater than or equal to 2.

7. The method of claim 1, wherein calculating the bit shift value comprises:

the bit shift value of each parameter is calculated during quantization of the parameter according to the fixed point format.

8. The method of claim 1, wherein the parameter is a weight or activation on the same layer in the neural network.

9. A computer-readable recording medium storing a program for causing a computer to execute the method according to claim 1.

10. An apparatus for quantifying parameters of a neural network for performing speech recognition and/or image recognition, the apparatus comprising:

a memory storing at least one program, and

A processor configured to execute the at least one program by:

Updating the fixed point format using the calculated bit shift value of the parameter, and

Wherein the processor is configured to:

11. The apparatus of claim 10, wherein the processor is configured to search for bits within a predetermined range based on the most significant bits of the integer portion of the fixed point format and to detect the most significant bits having a value of "1".

12. The apparatus of claim 10, wherein the processor is configured to determine a number of occurrences of overflow and a maximum bit shift value from the calculated bit shift value, and to update the fixed point format using the number of occurrences of overflow and the maximum bit shift value.

13. The apparatus of claim 12, wherein the processor is configured to update the fixed point format by reducing a fractional length of the fixed point format by the maximum bit shift value if the number of occurrences of overflow is greater than a predetermined value.

14. The apparatus of claim 13, wherein the predetermined value is based on a number of parameters.

15. The device according to claim 10,

T is a natural number greater than or equal to 2.

16. The apparatus of claim 10, wherein the processor is configured to calculate the bit shift value for each parameter during quantization of the parameter according to the fixed point format.

17. The apparatus of claim 10, wherein the parameter is a weight or activation on a same layer in the neural network.