I am familiar with the definition of the Frechet derivative and it's uniqueness if it exists. I would however like to know, how the derivative is the "best" linear approximation. What does this mean formally? The "best" on the entire domain is surely wrong, so it must mean the "best" on a small neighborhood of the point we are differentiating at, where this neighborhood becomes arbitrarily small? Why does the definition of the derivative formalize precisely this? Thank you in advance.
[Math] In what sense is the derivative the “best” linear approximation
derivativesnumerical methodsreal-analysis
Related Solutions
If $f: A\to B$ is Frechet differentiable then for all $X,V\in A$ the directional derivative
$$d_Vf(X)=\lim_{t\to 0}\frac{f(X+tV)-f(X)}{t}$$
exists and $d_Vf(X)=df(X)(V)$.
Now if $A=\mathbb R^{n\times n}$, $B=\mathbb R$ and $G = \frac{df(X)}{dX}$ then $G_{i,j}$ is just the directional derivative $d_{H_{i,j}}f(X)$ where $H_{i,j}$ is the matrix which has a $1$ on the $(i,j)$-th position and is zero otherwise. So the relation is
$$G_{ij}=df(X)(H_{i,j})$$
$\bullet$ Now consider the special map $f:\mathbb R^{n\times n}\to\mathbb R$ $X\mapsto Tr(XA)$. As this map is already linear we have $df(X)=f$ for all $X\in\mathbb R^{n\times n}$ so aplying the above relation yields
$$G_{i,j}=df(X)(H_{i,j})=f(H_{i,j})=Tr(H_{i,j}A)=A_{j,i}$$
so $G=A^T$ which also can be computed directly.
Let $f$ your function, $f_1 : x \rightarrow a(x-x_0) + b$ the approximation given by the derivative, and $f_2 : x \rightarrow a'(x-x_0) + b'$ some other approximation. The approximations are done in a neighborhood of $x_0$.
Of course, if $b' \neq b$, $f_2$ is a bad approximation since $f(x_0) \neq f_2(x_0)$.
Let us suppose $a \neq a'$.
You have $$\lim_{x \rightarrow x_0} \frac{f(x) - f_1(x)}{x - x_0} = 0$$. And : $$\frac{f(x) - f_2(x)}{x-x_0} = \frac{f(x) - f_1(x)}{x-x_0} + \frac{f_1(x) - f_2(x)}{x-x_0} = \frac{f(x) - f_1(x)}{x-x_0} + a - a'$$
In such a way that :
$$\lim_{x \rightarrow x_0} \frac{f(x) - f_2(x)}{x - x_0} = a - a'$$
This enables to show that (take the quotient of the two preceding limits) :
$$ \lim_{x \rightarrow x_0} \frac{f(x) - f_1(x)}{f(x) - f_2(x)} = 0$$
This last equality encapsulates the intuition that the (local at $x_0$) approximation $f_1$ is far better than $f_2$. If you write it with the epsilons : for all $\epsilon > 0$ there exists $\delta > 0$ such that for $x \in ]x_0 - \delta, x_0 + \delta[$, $|f(x) - f_1(x)| \leq \epsilon | f(x) - f_2(x) |$
EDIT : The last equality is also true if $b' \neq b$ (to be more precise, it is "more true", since in this case, $\lim_{x \rightarrow x_0} \frac{f(x) - f_2(x)}{x- x_0} = +\infty$). It justifies the nomination "best approximation" for the approximation given by the derivative : it is far better than any other.
Best Answer
Say the graph of $L$ is a straight line and at one point $a$ we have $L(a)=f(a)$. And suppose $L$ is the tangent line to the graph of $f$ at $a$. Let $L_1$ be another function passing through $(a,f(a))$ whose graph is a straight line. Then there is some open interval $(a-\varepsilon,a+\varepsilon)$ such that for every $x$ in that interval, the value of $L(x)$ is closer to the value of $f(x)$ than is the value of $L_1(x)$. Now one might then have another line $L_2$ through that point whose slope is closer to that of the tangent line than is that of $L_1$, such that $L_2(x)$ actually comes closer to $f(x)$ than does $L(x)$, for some $x$ in that interval. But now there is a still smaller interval $(a-\varepsilon_2,a+\varepsilon_2)$, within which $L$ beats $L_2$. For every line except the tangent line, one can make the interval small enough so that the tangent line beats the other line within that interval. In general there's no one interval that works no matter how close the rival line gets. Rather, one must make the interval small enough in each case separately.