The difference between Jacobian matrix and gradient of vector-valued function

Question

I've done some research before coming here and am still confused and so am looking for some help. I understand what a gradient is (vector of partial derivatives given by a scalar function), but I'm running into something I've seen referred to as a gradient of vector-valued functions.

With the vector-valued function
$$F(x)=(f_1(x), f_2(x), ....., f_m(x))$$

$\nabla F$ is a matrix containing the gradient of each component function, with respect to each variable. This results in a Jacobian matrix.

According to the following link (https://towardsdatascience.com/step-by-step-the-math-behind-neural-networks-d002440227fb) this is true. I've seen a few questions and answers here that state similar reasoning, show a Jacobian matrix and gradient are different as a gradient is always a vector(mainly about scalar), or that a gradient is a subtype of a Jacobian matrix. I also tried to use ChatGPT for some help, but it also appears to be confused on how they relate.

What's true? Specifically what is the relationship between a Jacobian matrix and the gradient vector-valued function if there is one? If there is not, what is the difference?

Hello. Basically, the Jacobian matrix is the general notion. If $F : \mathbb R^n \to \mathbb R^m$ is differentiable, as you say, you get the Jacobian matrix of $F$ at a point $x$ by regrouping every partial derivatives at $x$ of the "coordinate functions" $f_i$ (these are just $F$ composed with the appropriate projection on $\mathbb R$). If you take $m = 1$, the Jacobian of $F$ is just its gradient. By the way, you can replace $\mathbb R^n$ with an open subset $U \subset \mathbb R^n$ and $x \in U$. — Francis H.
– Francis H., Commented Apr 2, 2024 at 21:26
The Jacobian is a generalisation of the notion of gradient to vector valued functions. To be absolutely clear, if you have a function $f$ from $\mathbb{R}^n$ to $\mathbb{R}^m$, its Jacobian is the $m\times n$ matrix with whose $(i,j)^\text{th}$ entry is $\frac{\partial f_i}{\partial x_j}$. Very, very occasionally I've seen people refer to the determinant of the Jacobian as the Jacobian, but this is not very good practice! — Chris Lewis
– Chris Lewis, Commented Apr 2, 2024 at 21:31
@ChrisLewis You made me realise that I do this abuse of writing... In France, we talk about the "Jacobian matrix" and the "Jacobian" is indeed the determinant. Maybe that this could be a lead in the sense that this same abuse of language is practiced in other countries. — Francis H.
– Francis H., Commented Apr 2, 2024 at 21:35
@NaNoS so it's you!! ;-) Maybe I'm being too strict, but it seems (based on the question) that perhaps it can cause confusion (although the confusion might be from something else). I suppose it depends on your field: I'm studying dynamical systems, so we use the Jacobian matrix very often. But if the main reason you use it is in changes of variable for integration, you would be more interested in the determinant. — Chris Lewis
– Chris Lewis, Commented Apr 2, 2024 at 21:45

whpowell96 · Accepted Answer · 2024-04-04 15:16:45Z

Let $F: \mathbb{R}^n\to \mathbb{R}^m$ be some smooth map with components $F(x) = (f_1(x),\dots,f_m(x))^T$. The Jacobian matrix $JF(x)$ is defined to be the unique linear map satisfying $$ \lim_{h\to 0} \frac{F(x+h) - F(x) - JF(x)h}{\|h\|}=0. $$ This implies, among other things, that $JF(x)$ is the best linear approximation to $F$ at $x$. That is, $\Delta F = JF(x)\Delta x$. Notice that this definition immediately gives us the shape of $JF$ as a matrix with $n$ columns and $m$ rows, as it must be able to act on $n$-vectors and output $m$-vectors. If you write out the first component of this lienar approximation, we have $$ \Delta f_1 = J_{11}\Delta x_1 + \dots + J_{1n}\Delta x_n. $$ From multivariable calculus, we know that the best linear approximation of a scalar function is given by the partials, so we have $[JF(x)]_{ij} = \frac{\partial f_i}{\partial x_j}(x)$. That is, each row is the gradient vector of the $i$th component of $F$.

Now consider the case where $F$ has only a single component $f:\mathbb{R}^n\to \mathbb{R}$. This analysis will show us that $JF(x)$ is a $1\times n$ matrix given by $JF(x) = (f_{x_1}, \dots, f_{x_n})$. Ideed, we have $$ \Delta f = (f_{x_1}, \dots, f_{x_n}) \begin{pmatrix} \Delta x_1 \\ \vdots \\ \Delta x_n\end{pmatrix} = f_{x_1}\Delta x_1 + \dots + f_{x_n}\Delta x_n, $$ as expected. However, we can clearly identify this $1\times n$ matrix with the $n$-vector (column vector!) $g = (f_{x_1}, \dots, f_{x_n})^T$. This vector satisfies $JF(x)\Delta x = g\cdot \Delta x$. That is, the action of the derivative is replaced by a dot product with some vector. This is the definition of the gradient, so we define $\nabla f = g = (f_{x_1}, \dots, f_{x_n})^T = JF(x)^T$.

From this it is clear how the Jacobian and gradient differ. The Jacobian is a linear map between the same input and output spaces as $F$. It is the linear map that is defined by the definition of the derivative of multivariable functions. The gradient is not this type of object; it is actually an element of the input space that can be identified with a linear map via the dot product. This distinction comes up a lot when doing multivariable calculus on large systems like in machine learning, where if you write some things in terms of gradients and some in terms of Jacobians, the matrices will be the wrong size and you can't multiply them in the correct way to apply the chain and product rules for multivariable functions.

Stack Exchange Network

The difference between Jacobian matrix and gradient of vector-valued function

1 Answer 1

You must log in to answer this question.

Hot Network Questions

The difference between Jacobian matrix and gradient of vector-valued function

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions