Automatic Differentiation and Its Use: Explained in Plain English

Automatic Differentiation and Its Use: Explained in Plain English

BY MICHAEL CORLEY

Tags: #datascience #autodiff #machinelearning #artificialintelligence #autograd #statistics

I have most recently become extremely interested in automatic differentiation. I have long sought to entirely eliminate calculus from my daily life, which I view as just an extension of algebra involving memorization and simply an inconvenience while programming. The manner in which I think about partial differential equations is a series of discrete transformations drawn on from memorized proofs. In calculus, there is very little point in learning why the proofs work algebraically once they are proven, so most of us resort of memorization. With automatic differentiation you can take a huge piece of coded functions and math and automatically compute its derivative or sensitivity with respect to a single variable. It is also extremely fast, as I will describe later.

Symbolic Calculus, and Tools of Solve Symbolic Calculus Equations

In general, in calculus, derivatives study the rate of change of a line, whereas integrals in general are used to compute the area under a line (in statistics we typically use this with computation of probabilities). In this instance, we are talking about automatically computing derivatives of your entire code project.

As a reminder to those that remember, there are some equations that are discontinuous, meaning they do not form a single continuous line. You are going to have difficulty solving these types of equations with any tool, as they are also difficult to solve by hand.

Tools like Wolfram Alpha’s search engine, Matlab, Mathematica, and open source Scilab drastically reduce the probability of making an error when solving symbolic differential equations. They operate very similarly, but the constraint remains that they are still performing types of symbolic differentiation, which is redundant and slow. You also would be unwise to deploy a large scale implementation with Matlab, despite the option you have of running your code through it.

Symbolic differentiation finds the derivative of a given formula with respect to a specified variable, producing a new formula as it's output. In general, symbolic mathematics programs manipulate formulas to produce new formulas, rather than performing numeric calculations based on formulas” (Novak).

The topic of "automatic differentiation," which programmatically solves differential equations, died in the bowls of obscurity of computer science, but remains one of the most brilliant idea's with the most potential for innovation in numerous computational areas. It's usage as reemerged in many computational spaces, but typically buried behind a mountain of other code as a core component. What if you could compute the first, second or third derivative of any function or set of nested functions that you created using a programming language at lightning speed? There are a number of applications.

Practical Applications of Automatic Differentiation

One of the better-known applications of automatic differentiation took hold in data science, most notably with the concept of back propagation, or more so the computations in neural networks relating to automatic computation of gradients while moving backwards through activated functions (gradient descent). If someone, somewhere, did not understand the concept of automatic differentiation, we would not have appropriately function automatic gradation or gradient descent programs for neural networks. In other words, neural network computations would be tedious and slow.

Automatic gradation is a subset of auto differentiation, although this is unclear to both the programmers and to many people out there teaching and describing the idea. Often, they confuse both. For example, Pytorch originally named it’s method “autodiff” instead of “autograd,” when in reality “autograd,” short of automatic gradient, is actually a “autodiff” package (Grosse).

Adjoint differentiation, a terminology utilized in financial risk management, refers to automatic differentiation as well, but is the more common terminology in financial risk management. According to Antoine Savine, large scale value-at-risk calculations (most likely he is assuming the monte-carlo method is utilized) taking into account derivative (not mathematical derivatives but financial derivatives) instruments without the utilization of adjoint differentiation require's “bumping” single variables in your model code by small amounts, one at a time, to compute your models sensitivity to individual variables. This process for larger firms requires an entire data center to run all night to calculate, because of the amount of data. Utilizing automatic differentiation of their model (partial derivatives give you the rate of change or sensitivity), this can be done in less than a minute on your laptop. Anyone at home can perform this task themselves.

Implementation

There are two alternative methodologies for building an automatic differentiation package, but both involve decomposing the functions in your code utilizing the chain rule to streamline the process, i.e. f(g(x)) = f'(g(x))⋅g'(x). You can see this as useful because very likely you code involves a lot of nested functions and you can break them down into a method a computer can interpret.

They also both involve decomposing calculus into a series of discrete operations, like addition, subtraction, multiplications, etc. In the programming language, these are called operators.

The two methods are source code transformation and operation overloading. Source code transformation involves parsing the text and computing the differential equations from the text of that code, while operation overloading simply involves overwriting the basic methods in Python, C, C++, or whatever language you are writing. For example, “+” is really math.addition(a,b), or something like that (it's just a method of a class statement and you can overwrite methods). You overwrite the addition method operator to compute the derivative of it and do this with all other operators. This is really a much simpler way of doing it and doesn’t involve touching or editing your original code (Berland).

The pro’s and cons of each method:

Source code transformation

  • Pro: It is possible in all computer languages
  • Pro: It can be applied to your old legacy Fortran/C code. Allows easier compile time optimizations.
  • Con: Source code swell’s in size
  • Con: it is more difficult to code the AD tool

Operator overloading

  • Pro: No changes in your original code
  • Pro: Flexible when you change your code or tool Easy to code the AD tool
  • Con: Only possible in selected languages
  • Con: Current compilers lag behind, code runs slower

Additional resources:

  1. The definitive guide to all things "autodiff," including listings hundreds of papers and books written on the subject matter, many of which you will find code examples and clear mathematical examples of it deployed: http://www.autodiff.org/. This will be useful when you attempt to build in things like loops, etc.).
  2. A simple summary of the calculus behind "autograd" and back propagation, condensed to 4 pages: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf

References

Novak, Gordon. CS 381K: Symbolic Differentiation, University of Texas, 2007, www.cs.utexas.edu/users/novak/asg-symdif.html.

Grosse, Roger. “CSC321 Lecture 10: Automatic Differentiation.” University of Toronto, 2018, https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf.

Savine, Antoine. A Brief Introduction to Automatic Adjoint Differentiation (AAD), QuantMinds 365, 2019, https://knect365.com/quantminds/article/2d9e3329-cae1-4320-86a2-a4e2303b1bfe/a-brief-introduction-to-automatic-adjoint-differentiation-aad.

Berland, Havard. “Automatic Differentiation (Lecture).” Department of Mathematical Sciences, NTNU, 2006, www.pvv.ntnu.no/~berland/resources/autodiff-triallecture.pdf.

Haseeb Ahmed Janjua

ConocoPhillips2K followers

6y

I strongly believe that general-purpose Automatic Differentiation is the future of gradient-based machine learning. Nice article!

To view or add a comment, sign in

Others also viewed

Explore content categories