[Math] Directional derivative: what is the relation between definition by limit and definition as dot product

derivativeslinear algebravector analysis

I'm trying to get intuition about why gradient is pointing to the direction of the steepest ascent. I got confused because I found that directional derivative is explained with help of gradient and gradient is explained with help of directional derivative.

Please explain what are the exact steps that lead from
directional derivative defined by the limit $\nabla_{v} f(x_0) = \lim_{h\to 0} \frac{f(x_0+hv)-f(x_0)}h$ to directional derivative defined as dot product of gradient and vector $\nabla_{v} f(x_0) = \nabla f(x_0)\cdot{v}$ ?

In other words how to prove the following? $$\lim_{h\to 0} \frac{f(x_0+hv)-f(x_0)}h = \nabla f(x_0)\cdot{v}$$

Best Answer

The limit $\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t$ gives the definition of the derivative in the direction of the unit vector $v$ at $x=x_0\in \mathbb R^n$, that is $\frac{\partial}{\partial v} f (x_0)$.

The formula $$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v$$ gives a property which is valid under the hypothesis that $f$ is differentiable at $x=x_0$, and is quite useful for calculations. (If $f$ is not differentiable at $x=x_0$, then that relation doesn't need be true, even if all directional derivatives exist.)

The idea of the proof is that being $f$ differentiable at $x_0$, then the gradient $\nabla f(x_0)$ exists and $$\lim_{x\to x_0}\frac{|f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)|}{||x-x_0||}=0$$

Let's think of the point $x=x_0+tv$ (say for fixed $x_0$ and $v$). By definition of directional derivative (and substracting and adding $\nabla f(x_0)\cdot (x_0+tv-x_0$), leads to

$$\frac{\partial}{\partial v} f (x_0)=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t=$$ $$=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}+\frac{\nabla f(x_0)\cdot(x_0+tv-x_0)}{t}.$$

And because the limit of the first summand is $0$ (why?) (*) and the second one is constant the result is $$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v,$$ which gives the usual formula.

What might be more interesting to understand this relation is when there's no such relation. Let $f \colon \mathbb R^2 \to \mathbb R$, and $$f(x,y)= \begin{cases} \tfrac{x^2y}{x^2+y^2} & (x,y)\neq (0,0) \\ 0 & (x,y)=(0,0). \\ \end{cases}$$

An easy calculation using the definition shows that, if $v=(v_x,v_y)$ (let's assume $||v||=1$), the directional derivative is in each direction $$\frac{\partial}{\partial v} f (0,0)=\frac{v_x^2 v_y}{v_x^2+v_y^2}=v_x^2 v_y$$ (in particular, both $\frac{\partial}{\partial x} f (0,0)$ and $\frac{\partial}{\partial y} f (0,0)$ are zero, that is $\nabla f(0,0)=(0,0)$.

So, if the 'dot-product formula' were valid, it should be the case that $$\frac{\partial}{\partial v} f (0,0)=(0,0)\cdot (v_x,v_y)=0,$$ which only happens in the directions of the $x$ and $y$ axes. (BTW, this also proves that $f$ is not differentiable at $(0,0)$.)

I suggest you try to imagine why the way in which directional derivatives vary as we change direction in this case (think of the $xy$ plane as the floor) are not compatible with the existence of a tangent plane (differentiability).


(*) In order to verify that $$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}=0,$$ first note that $\frac{|t|\,||v||}{t}$ equals plus or minus $||v||$, depending on the sign of $t$, which means is a bounded function of $t$ ($t\neq 0$). So, to prove our claim is enough to show that $$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}=0.$$

But this is a consequence of $f$ being differentiable. Indeed, we say that $f\colon \mathbb R^n \rightarrow \mathbb R$ is differentiable at $x_0$ if and only if $$\lim_{x\to x_0} \frac{f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)}{||x-x_0||}=0.$$

Our expression just has $x_0+tv$ instead of $x$, and as the limit is for $t\to 0$, it is also true that $x_0+tv\to x_0$. The only difference is that the definition of differentiable function uses a double/triple/etc. limit (think of sequences of points of $\mathbb R^n$ converging to $x_0$ from every direction and in all sorts of simple or complicated paths), while in our limit $x$ tends to $x_0$ only along the straight line in the direction of $v$. But since $f$ is differentiable at $x_0$, the last limit is $0$, and the same is true if we restrict to the subset of $\mathbb R^n$ that is such line.

Related Question