Multivariable Calculus – Understanding the Derivative as a Linear Transformation

multivariable-calculus

It's been a while now I am studying multivariable calculus and the concept of differentiation in space (or higher dimension). I saw relative posts but one question remains. I can't understand the concept of linear transformation that we use to define the Frechet derivative. In single variable the derivative is the best linear approximation of the function, so I guess this extends to multivariable but we can't use a number for this (why?) and instead we use a matrix. Can someone clears this for me in plain english?

Best Answer

The point is that for a function $f : \mathbb{R} \to \mathbb{R}$, $f'(a)$ defines a linear transformation, just life $Df({\bf a})$ does for a function $f : \mathbb{R}^n \to \mathbb{R}^m$.

In single variable calculus, we are taught that the derivative of $f(x)$ at a point $x = a$ is a real number $f'(a)$ which represents the slope of the tangent line to the graph of $f(x)$ at the point $x = a$. The equation of this tangent line is $y = f'(a)(x-a) + f(a)$; this is the best linear approximation of $f(x)$ near $x = a$, not the derivative itself.

If we do the change of variables $x^* = x - a$, $y^* = y - f(a)$, the tangent line becomes $y^* = f'(a)x^*$; this is a linear function, which is just a linear transformation $\mathbb{R} \to \mathbb{R}$, and the standard matrix of this linear transformation is the $1\times 1$ matrix $[f'(a)]$.

In higher dimensions, we start with $f : \mathbb{R}^n \to \mathbb{R}^m$ and at a point ${\bf a} \in \mathbb{R}^n$ we have the derivative $Df({\bf a})$ which is an $m\times n$ matrix $Df({\bf a}) = \left[\frac{\partial f_i}{\partial x_j}({\bf a})\right]$ which is sometimes called the Jacobian of $f$ at ${\bf a}$. Then the best linear approximation of $f({\bf x})$ near ${\bf x} = {\bf a}$ is ${\bf y} = Df({\bf a})({\bf x}-{\bf a}) + f({\bf a})$.

If we do the change of variables ${\bf x}^* = {\bf x} - {\bf a}$, ${\bf y}^* = {\bf y} - f({\bf a})$, the tangent line becomes ${\bf y}^* = Df({\bf a}){\bf x}^*$; this is a linear transformation $\mathbb{R}^n \to \mathbb{R}^m$, and the standard matrix of this linear transformation is the $m\times n$ matrix $Df({\bf a})$.

So the derivative in single variable calculus is just a special case of the derivative in multivariable calculus; just set $m = n = 1$.

As for your question, 'why can't we use a number for the best linear approximation for a function $\mathbb{R}^n \to \mathbb{R}^m$?', note that the approximating function must be $\mathbb{R}^n \to \mathbb{R}^m$, and because it is linear, it must be of the form ${\bf y} = A{\bf x} + {\bf b}$ where $A$ an $m \times n$ matrix and ${\bf b} \in \mathbb{R}^m$. By enforcing the condition that the linear approximation must agree with the function at ${\bf x} = {\bf a}$, we find that the linear approximation must be of the form ${\bf y} = A({\bf x} - {\bf a}) + f({\bf a})$. So the only thing left to determine is the $m\times n$ matrix $A$, not a single number as in single variable calculus.

Related Question