It's been a while now I am studying multivariable calculus and the concept of differentiation in space (or higher dimension). I saw relative posts but one question remains. I can't understand the concept of linear transformation that we use to define the Frechet derivative. In single variable the derivative is the best linear approximation of the function, so I guess this extends to multivariable but we can't use a number for this (why?) and instead we use a matrix. Can someone clears this for me in plain english?
Multivariable Calculus – Understanding the Derivative as a Linear Transformation
multivariable-calculus
Related Solutions
I will try to answer this question myself (I don't know if this is the "right" answer, but I will just throw it here as its better than nothing)
What I want is to derive the concept of derivative & differential by only using the concept of limit and linear approximation. As I mentioned in my [Step 3], if I just want to approximate the value of $f(x)$ by the linear equation: $$f(x)=f(a)+A(x-a)+E=f(a)+A\Delta x+E\ \ \ \ \ \ (1)$$ then there is infinite choices of $A$ for me to pick.
So the question becomes: What kind of $A$ do I actually want ? or What kind of $A$ is "nice" enough ?
Now there are two goals of linear approximation (which i want $A$ to satisfy):
- I want the error $E$ to be "small" enough so my calculation is accurate even if I choose to ignore $E$ from the equation.
- For the sake of predictability and convenience, I want my approximation becomes more accurate when I perform a single operation (or do something), so I can tell someone or a computer how to improve the accuracy during calculation (or when is the approximation not accurate enough).
[Consider the first goal above], "small" with respect to what ? There are three terms on the right side of equation (1), because $f(a)$ is a constant so the only two terms affect the accuracy of my approximation is $A\Delta x$ and $E$, that is to say I want $E$ to be "small" with respect to $A\Delta x$. This means the value of fraction $$\frac{E}{A\Delta x}\ \text{is very small}.$$
[Consider the second goal above], I realize that $\frac{E}{A\Delta x}$ cannot always be very small. What I am looking for is an operation that will make it smaller (or larger) so I can tell someone or a computer what to do (or not to do) to improve the accuracy.
Now as the value of $E$ is dependent on $\Delta x$ for different choices of $x$, the only two options I have would be let $\Delta x \to 0$ or $\Delta x \to \infty$. Because any other options will likely involve letting $\Delta x$ be some kind of complicated function of $x$, and not only it defeats the pupose of linear approximation (for example, assume $\Delta x$ is a parabola function with respect to $x$, then why bother doing linear approximation at the first place ? I should just do a parabolic approximation !) but also this will likely involve more than one operation during approximation (which is not good if there are too many operations a person or a computer needs to carry).
So now I need to evaulate the two operations above:
- It is obvious that I cannot guarantee $\frac{E}{A\Delta x}$ to keep becoming smaller when $\Delta x \to \infty$.
- It seems possible for me to find a $A$ such that the value of $\frac{E}{A\Delta x}$ keeps decreasing when $\Delta x \to 0$. Which means I should probably look at the following limit $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$$
With the above two consideration, I should try to develop the requirement of $A$:
Assume $\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$ exists, what I want is $\frac{E}{A\Delta x}$ become smaller as $\Delta x \to 0$. This means at best I should expect this limit goes to zero (and the "nicest $A$" should at least satisfies the value of this limit), that is: $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}=\frac{1}{A}\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ Now I can say $E$ is a higher order inifitesimal of $\Delta x$. I can then express $E$ with the following equation $$E=\epsilon\Delta x\text{, where} \lim\limits_{\Delta x \to 0}\epsilon=0$$ Substitute $E=\epsilon\Delta x$ back to the equation (1) above, I have $$f(x)=f(a)+A\Delta x+\epsilon\Delta x$$ $$A+\epsilon=\frac{f(x)-f(a)}{\Delta x}$$ And it is not hard to see that $$\lim\limits_{\Delta x \to 0}(A+\epsilon)=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$\lim\limits_{\Delta x \to 0}A + \lim\limits_{\Delta x \to 0}\epsilon=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$A=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ Now I can define $A$ to be the derivative, and $A\Delta x$, $\Delta x$ to be the differential. I can also claim for each $(x,f(x))$ in the interval, such $A$ is unique due to the uniqueness nature of limit $\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$.
(I can also claim now this unique $A$ is the "nicest $A$" because it is the only one that satisfies the least requirement of the "nicest $A$")
Thus I successfuly bring in the concept of derivative and differential by only using the concept of limit and linear approximation. And the function is said to be differentiable when the limit $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0\ \ \text{exists}$$
Best Answer
The point is that for a function $f : \mathbb{R} \to \mathbb{R}$, $f'(a)$ defines a linear transformation, just life $Df({\bf a})$ does for a function $f : \mathbb{R}^n \to \mathbb{R}^m$.
In single variable calculus, we are taught that the derivative of $f(x)$ at a point $x = a$ is a real number $f'(a)$ which represents the slope of the tangent line to the graph of $f(x)$ at the point $x = a$. The equation of this tangent line is $y = f'(a)(x-a) + f(a)$; this is the best linear approximation of $f(x)$ near $x = a$, not the derivative itself.
If we do the change of variables $x^* = x - a$, $y^* = y - f(a)$, the tangent line becomes $y^* = f'(a)x^*$; this is a linear function, which is just a linear transformation $\mathbb{R} \to \mathbb{R}$, and the standard matrix of this linear transformation is the $1\times 1$ matrix $[f'(a)]$.
In higher dimensions, we start with $f : \mathbb{R}^n \to \mathbb{R}^m$ and at a point ${\bf a} \in \mathbb{R}^n$ we have the derivative $Df({\bf a})$ which is an $m\times n$ matrix $Df({\bf a}) = \left[\frac{\partial f_i}{\partial x_j}({\bf a})\right]$ which is sometimes called the Jacobian of $f$ at ${\bf a}$. Then the best linear approximation of $f({\bf x})$ near ${\bf x} = {\bf a}$ is ${\bf y} = Df({\bf a})({\bf x}-{\bf a}) + f({\bf a})$.
If we do the change of variables ${\bf x}^* = {\bf x} - {\bf a}$, ${\bf y}^* = {\bf y} - f({\bf a})$, the tangent line becomes ${\bf y}^* = Df({\bf a}){\bf x}^*$; this is a linear transformation $\mathbb{R}^n \to \mathbb{R}^m$, and the standard matrix of this linear transformation is the $m\times n$ matrix $Df({\bf a})$.
So the derivative in single variable calculus is just a special case of the derivative in multivariable calculus; just set $m = n = 1$.
As for your question, 'why can't we use a number for the best linear approximation for a function $\mathbb{R}^n \to \mathbb{R}^m$?', note that the approximating function must be $\mathbb{R}^n \to \mathbb{R}^m$, and because it is linear, it must be of the form ${\bf y} = A{\bf x} + {\bf b}$ where $A$ an $m \times n$ matrix and ${\bf b} \in \mathbb{R}^m$. By enforcing the condition that the linear approximation must agree with the function at ${\bf x} = {\bf a}$, we find that the linear approximation must be of the form ${\bf y} = A({\bf x} - {\bf a}) + f({\bf a})$. So the only thing left to determine is the $m\times n$ matrix $A$, not a single number as in single variable calculus.