Why does the term $A\mathbf{h}/\|\mathbf{h}\|$ appear in the definition of the derivative

analysisderivativesfunctional-analysismultivariable-calculusreal-analysis

My analysis course provides the following definition of the derivative of $f$, $Df(\mathbf{a})$:

Definition: A function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is differentiable at $\mathbf{a} \in \mathbb{R}^{n}$ if there exists a linear function $\mathbf{x} \mapsto A \mathbf{x}$ (where $A$ is an $m \times n$ matrix) from $\mathbb{R}^{n}$ to $\mathbb{R}^{m}$ such that
$$
\frac{f(\mathbf{a}+\mathbf{h})-f(\mathbf{a})-A \mathbf{h}}{\|\mathbf{h}\|} \rightarrow \mathbf{0}.
$$

Now, if we unpack this and go back to the limit definition we get that $Df(\mathbf{a})$ is
$$
\lim_{\|\mathbf{h}\| \rightarrow 0 } \frac{f(\mathbf{a}+\mathbf{h})-f(\mathbf{a})}{\|\mathbf{h}\|} – \frac{A \mathbf{h}}{\|\mathbf{h}\|}.
$$

What I don't understand is the term $\frac{A \mathbf{h}}{\|\mathbf{h}\|}$. Why is the product $A \mathbf{h}$ scaled by the norm of $\|\mathbf{h}\|$ as well?

Would appreciate further insight into this definition.

Additionally, based on this definition of the derivative, it is possible for directional derivatives to exist even the function is not differentiable at a particular vector $\mathbf{a}$. Why is this the case ?

Defining the directional derivative as:

$$D_{\mathbf{v}} f(\mathbf{a})=\lim _{h \rightarrow 0} \frac{f(\mathbf{a}+h \mathbf{v})-f(\mathbf{a})}{h}.$$

Best Answer

First off, minor nitpick: $A \mathbf{h}$ is not a dot product. The dot product takes two $n$-dimensional real (or complex) vectors as input, and gives a real number as output. $A$ is a matrix (not a vector) which represents a linear transformation $\mathbb{R}^m \to \mathbb{R}^n$ which acts on $\mathbf{h}$ (a vector in $\mathbb{R}^m$) in order to produce a vector in $\mathbb{R}^n$.

The Derivative

Addressing the question: in one dimension, the derivative of $f$ at a point $a$ is generally defined as the limit of some kind of difference quotient, e.g. $$ f'(a) = \lim_{h\to 0} \frac{f(a+h) - f(a)}{h},$$ assuming that this limit exists. This definition is usually justified geometrically: we think about approximating a tangent line to a curve via secant lines defined by points which are closer and closer to each other on the curve. In that sense, the definition of the derivative in one dimension is natural.

Manipulate this identity to get \begin{align} 0 &= \left( \lim_{h\to 0} \frac{f(a+h) - f(a)}{h}\right) - f'(a) \\ &= \lim_{h\to 0} \left( \frac{f(a+h) - f(a)}{h} - f'(a) \right) && (\text{$f'(a)$ is a constant)} \\ &= \lim_{h\to 0} \frac{ f(a+h) - f(a) - hf'(a)}{h}. \end{align} With a little more manipulation, the definition of the derivative can be rewritten as

Definition: A function $f : \mathbb{R} \to \mathbb{R}$ is differentiable at a point $a \in \mathbb{R}$ if there exists a linear function $x \mapsto Ax$ from $\mathbb{R}$ to $\mathbb{R}$ (where $A$ is a $1\times 1$ matrix) such that $$ \frac{f(a+h) - f(a) - Ah}{|h|} \to 0. $$

This version of the definition follows from the "usual" geometric definition, but encapsulates a slightly different idea. In this case, we think of the derivative as the linear operator which best approximates how the function is changed by small perturbations around $a$. That is, if we perturb $a$ by $h$, what will be the corresponding change in $f$, assuming that $f$ is (at least locally) linear?

The actual perturbation is the difference between $f(a+h)$ and $f(a)$, while the linear approximation of the perturbation is $Ah$. The derivative $A$ is the best linear approximation in the sense that the relative error between actual perturbation and the linear approximation can be made as small as possible.

In higher dimensions, the same idea applies: if we perturb $\mathbf{x}$ by just a little, this is going to perturb $f$ by just a little. We then want to find the linear function which best approximates this perturbation. We assess this by looking at the relative error, i.e the difference between the actually perturbation and the linear approximation, scaled by the size of the perturbation in $\mathbf{x}$. The result is the definition of the derivative given in the question.

Directional Derivatives

Regarding the existence of directional derivatives even if the derivative does not exist: a directional derivative is the best linear approximation of a function in a particular direction, whereas the derivative is the best approximation in any direction. It is possible for a function to be approximable in one direction, but not in every direction. As a simple example, consider $$ f(x,y) = |x|. $$ If you are moving parallel to the $y$-axis, the function is constant, so $$ D_{\langle 0, 1\rangle} f(x,y) = \lim_{h \to 0} \frac{f(x,y+h) - f(x,y)}{h} = \lim_{h\to 0} \frac{ |x| - |x| }{h} = 0. $$ Here, the directional derivative in the direction of $\langle 0, 1 \rangle$ exists (and is zero). However, in any other direction, the derivative does not exist wherever $x=0$. For example $$ D_{\langle 1,0 \rangle} f(0,0) = \lim_{h \to 0} \frac{ f(0+h, 0) - f(0,0) }{ h } = \lim_{h \to 0} \frac{ |0+h| - |0| } {h} = \lim_{h\to 0} \frac{|h|}{h}, $$ which does not exist.

Related Question