Automatic Differentiation and Its Use: Explained in Plain English

Michael Corley, MBA, LSSBB

Sigma Data Science•2K followers

Published Nov 16, 2019

BY MICHAEL CORLEY

Tags: #datascience #autodiff #machinelearning #artificialintelligence #autograd #statistics

I have most recently become extremely interested in automatic differentiation. I have long sought to entirely eliminate calculus from my daily life, which I view as just an extension of algebra involving memorization and simply an inconvenience while programming. The manner in which I think about partial differential equations is a series of discrete transformations drawn on from memorized proofs. In calculus, there is very little point in learning why the proofs work algebraically once they are proven, so most of us resort of memorization. With automatic differentiation you can take a huge piece of coded functions and math and automatically compute its derivative or sensitivity with respect to a single variable. It is also extremely fast, as I will describe later.

Symbolic Calculus, and Tools of Solve Symbolic Calculus Equations

In general, in calculus, derivatives study the rate of change of a line, whereas integrals in general are used to compute the area under a line (in statistics we typically use this with computation of probabilities). In this instance, we are talking about automatically computing derivatives of your entire code project.

As a reminder to those that remember, there are some equations that are discontinuous, meaning they do not form a single continuous line. You are going to have difficulty solving these types of equations with any tool, as they are also difficult to solve by hand.

Tools like Wolfram Alpha’s search engine, Matlab, Mathematica, and open source Scilab drastically reduce the probability of making an error when solving symbolic differential equations. They operate very similarly, but the constraint remains that they are still performing types of symbolic differentiation, which is redundant and slow. You also would be unwise to deploy a large scale implementation with Matlab, despite the option you have of running your code through it.

“Symbolic differentiation finds the derivative of a given formula with respect to a specified variable, producing a new formula as it's output. In general, symbolic mathematics programs manipulate formulas to produce new formulas, rather than performing numeric calculations based on formulas” (Novak).

The topic of "automatic differentiation," which programmatically solves differential equations, died in the bowls of obscurity of computer science, but remains one of the most brilliant idea's with the most potential for innovation in numerous computational areas. It's usage as reemerged in many computational spaces, but typically buried behind a mountain of other code as a core component. What if you could compute the first, second or third derivative of any function or set of nested functions that you created using a programming language at lightning speed? There are a number of applications.

Practical Applications of Automatic Differentiation

One of the better-known applications of automatic differentiation took hold in data science, most notably with the concept of back propagation, or more so the computations in neural networks relating to automatic computation of gradients while moving backwards through activated functions (gradient descent). If someone, somewhere, did not understand the concept of automatic differentiation, we would not have appropriately function automatic gradation or gradient descent programs for neural networks. In other words, neural network computations would be tedious and slow.

Automatic gradation is a subset of auto differentiation, although this is unclear to both the programmers and to many people out there teaching and describing the idea. Often, they confuse both. For example, Pytorch originally named it’s method “autodiff” instead of “autograd,” when in reality “autograd,” short of automatic gradient, is actually a “autodiff” package (Grosse).

Adjoint differentiation, a terminology utilized in financial risk management, refers to automatic differentiation as well, but is the more common terminology in financial risk management. According to Antoine Savine, large scale value-at-risk calculations (most likely he is assuming the monte-carlo method is utilized) taking into account derivative (not mathematical derivatives but financial derivatives) instruments without the utilization of adjoint differentiation require's “bumping” single variables in your model code by small amounts, one at a time, to compute your models sensitivity to individual variables. This process for larger firms requires an entire data center to run all night to calculate, because of the amount of data. Utilizing automatic differentiation of their model (partial derivatives give you the rate of change or sensitivity), this can be done in less than a minute on your laptop. Anyone at home can perform this task themselves.

Implementation

There are two alternative methodologies for building an automatic differentiation package, but both involve decomposing the functions in your code utilizing the chain rule to streamline the process, i.e. f(g(x)) = f'(g(x))⋅g'(x). You can see this as useful because very likely you code involves a lot of nested functions and you can break them down into a method a computer can interpret.

They also both involve decomposing calculus into a series of discrete operations, like addition, subtraction, multiplications, etc. In the programming language, these are called operators.

The two methods are source code transformation and operation overloading. Source code transformation involves parsing the text and computing the differential equations from the text of that code, while operation overloading simply involves overwriting the basic methods in Python, C, C++, or whatever language you are writing. For example, “+” is really math.addition(a,b), or something like that (it's just a method of a class statement and you can overwrite methods). You overwrite the addition method operator to compute the derivative of it and do this with all other operators. This is really a much simpler way of doing it and doesn’t involve touching or editing your original code (Berland).

The pro’s and cons of each method:

Source code transformation

Pro: It is possible in all computer languages
Pro: It can be applied to your old legacy Fortran/C code. Allows easier compile time optimizations.
Con: Source code swell’s in size
Con: it is more difficult to code the AD tool

Operator overloading

Pro: No changes in your original code
Pro: Flexible when you change your code or tool Easy to code the AD tool
Con: Only possible in selected languages
Con: Current compilers lag behind, code runs slower

Additional resources:

The definitive guide to all things "autodiff," including listings hundreds of papers and books written on the subject matter, many of which you will find code examples and clear mathematical examples of it deployed: http://www.autodiff.org/. This will be useful when you attempt to build in things like loops, etc.).
A simple summary of the calculus behind "autograd" and back propagation, condensed to 4 pages: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf

References

Novak, Gordon. CS 381K: Symbolic Differentiation, University of Texas, 2007, www.cs.utexas.edu/users/novak/asg-symdif.html.

Grosse, Roger. “CSC321 Lecture 10: Automatic Differentiation.” University of Toronto, 2018, https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf.

Savine, Antoine. A Brief Introduction to Automatic Adjoint Differentiation (AAD), QuantMinds 365, 2019, https://knect365.com/quantminds/article/2d9e3329-cae1-4320-86a2-a4e2303b1bfe/a-brief-introduction-to-automatic-adjoint-differentiation-aad.

Berland, Havard. “Automatic Differentiation (Lecture).” Department of Mathematical Sciences, NTNU, 2006, www.pvv.ntnu.no/~berland/resources/autodiff-triallecture.pdf.

Peter Foelsche

red violet (NASDAQ: RDVT)•525 followers

https://github.com/ExcessPhase/ctaylor

Haseeb Ahmed Janjua

ConocoPhillips•2K followers

I strongly believe that general-purpose Automatic Differentiation is the future of gradient-based machine learning. Nice article!

Automatic Differentiation and Its Use: Explained in Plain English

Michael Corley, MBA, LSSBB

Sigma Data Science•2K followers

Symbolic Calculus, and Tools of Solve Symbolic Calculus Equations

Practical Applications of Automatic Differentiation

Implementation

The pro’s and cons of each method:

Additional resources:

References

More articles by this author

Others also viewed

What does discrete mathematics study?

Simulation Parameters of ML Model of a Carbon Nanotube Wire

Discrete Maths in Google Maps

Linear Algebra

CAScad: an open source reactive computer algebra notebook in the browser

Linear Transformations

The Legacy of Algorithms: The Industrial Revolution and the Birth of Modern Computing

ESI acquires Scilab, a numerical computation platform

3 things that are wrong with most programing languages

Explore content categories

Symbolic Calculus, and Tools of Solve Symbolic Calculus Equations

Practical Applications of Automatic Differentiation

Implementation

The pro’s and cons of each method:

Additional resources:

References

Must know consumer privacy / privacy policy statistics

Jan 18, 2021

A Guide to Imposter Syndrome in a Highly Technical World + Accelerated Learning

Nov 8, 2019

Python data structures: A Secret Reference

Oct 8, 2019

Linear and Quadratic Programming in Python

Sep 9, 2019

Exact Replication vs. Probabilistic Thinking

Sep 7, 2019

Probability, Gender, and Independence Assumptions: Two Question to Make You Think About Sampling Probabilities

Sep 7, 2019

Python: Mass Downloading Equities Prices

Aug 15, 2019

Survey: Tensorflow is slightly preferred by data scientists to PyTorch (58% vs. 42%; n=969)

Jul 18, 2019

Most crypto enthusiasts think Libra is a bad idea (65.6%; n=131).

Jul 13, 2019

Living with bad market research for tech sector investments

Jul 6, 2019

Others also viewed

What does discrete mathematics study?

Simulation Parameters of ML Model of a Carbon Nanotube Wire

Discrete Maths in Google Maps

Linear Algebra

CAScad: an open source reactive computer algebra notebook in the browser

Linear Transformations

The Legacy of Algorithms: The Industrial Revolution and the Birth of Modern Computing

ESI acquires Scilab, a numerical computation platform

3 things that are wrong with most programing languages

Explore content categories