Derive the directional derivative using Taylor expansion

derivativesmultivariable-calculustaylor expansion

I am told that the directional derivative is defined as
$$
D_vf(x) = \lim_{h \rightarrow 0} \frac{f(x+hv)-f(x)}{h}
$$

So my way of deriving this kind of stuff has always been the Taylor expansion ($v^j$ and $x^j$ are the components of $v$ and $x$):
$$
f(x+hv) \cong f(x) + \sum_j \frac{\partial f}{\partial x^j} hv^j
$$

which would imply that
$$
1 : D_vf(x) = \sum_j \frac{\partial f}{\partial x^j} v^j
$$

which is does not seem too far from the truth. However I am also told that the definition of a gradient is
$$
2: D_vf(x) = \langle\nabla f, v\rangle
$$

Clearly 1 and 2 are equivalent in cartesian coordinates, but the gradient for polar coordinates and similar is not just the stacked partial derivatives, but somehow the inverse metric tensor comes into play, which would mean that my derivation with the Taylor series is not correct as such. Can somebody tell me where I went wrong?

To make my confusion clear:

Polar coordinates, $x = \rho e_{\rho} + \phi e_{\phi}$, $f(x) = \phi$.

$\nabla f = [\frac{\partial f}{\partial \rho}, \frac{1}{\rho} \frac{\partial f}{\partial \phi}]$ (according to wikipedia)

Using Formula 1: $D_v f(x) = v^{\phi}$

Using Formula 2: $D_v f(x) = \frac{v^{\phi}}{\rho}$

Best Answer

(1) and (2) are both right, but it's just that the $v^{\phi}$ in your two formulas mean different things, and you've unknowingly abused notation by calling them both $v^{\phi}$. This issue boils down to the distinction between the tangent vectors $\frac{\partial}{\partial \phi}$ and $e_{\phi}$. The first vector has norm $r$, while the second vector has norm $1$; and it is precisely this factor of $r$ which is the "discrepancy" you observed among the components.

Note that in the formula \begin{align} D_vf(p) &= \sum_{j=1}^n \frac{\partial f}{\partial x^i}\bigg|_p \cdot v^i \end{align} we often say "$v^i$ is the component of the vector $v$", but strictly speaking, this is an incomplete sentence. Components with respect to which basis? For this formula to work, the way we have to interpret it is that we have to write a vector $v$ as \begin{align} v &= \sum_{i=1}^n v^i \frac{\partial}{\partial x^i}\bigg|_p \end{align} In other words, they are the components of $v$ with respect to the basis $\left\{\frac{\partial}{\partial x^i}(p)\right\}_{i=1}^n$ of the tangent space $T_pM$. Once again, said differently, we have $v^i:= dx^i(p)[v]$ (the evaluation of a covector on a vector). In differential geometry, we often deal with such "coordinate induced basis".

However, in vector calculus, people often work with the normalied version of these vectors: \begin{align} e_i := \dfrac{\frac{\partial}{\partial x^i}(p)}{\lVert \frac{\partial}{\partial x^i}(p)\rVert} \end{align}

In the case of polar coordinates in the plane, we have the following vectors: $\frac{\partial}{\partial r}, \frac{\partial}{\partial \phi}$ and their normalized counterparts $e_r, e_{\phi}$. The relation between them is: \begin{align} \frac{\partial}{\partial r} &= e_r \quad \text{and} \quad \frac{\partial}{\partial \phi} = re_{\phi} \tag{$*$} \end{align}

So, now given a vector $v$, we can write it as \begin{align} v &= v^r \frac{\partial}{\partial r} + v^{\phi} \frac{\partial}{\partial \phi} \end{align} for some numbers $v^r, v^{\phi}\in \Bbb{R}$, OR, we can also write it as \begin{align} v &= \xi^r e_r + \xi^{\phi} e_{\phi} \end{align} for some OTHER numbers $\xi^r, \xi^{\phi}\in \Bbb{R}$. Now, based on $(*)$, we can deduce that \begin{align} \begin{cases} \xi^r &= v^r \\ \xi^{\phi} &= r v^{\phi} \tag{$**$} \end{cases} \end{align}

One last thing: when Wikipedia says $\nabla f = \left( \frac{\partial f}{\partial r}, \frac{1}{r}\frac{\partial f}{\partial \phi}\right)$, it should really specify the basis being used. The explicit expression is: \begin{align} \nabla f &= \frac{\partial f}{\partial r} e_r + \frac{1}{r}\frac{\partial f}{\partial \phi} e_{\phi} \\ &= \frac{\partial f}{\partial r}\frac{\partial }{\partial r} + \frac{1}{r^2} \frac{\partial f}{\partial \phi}\frac{\partial }{\partial \phi} \tag{$\ddot{\frown}$} \end{align}


Now, we are finally ready to resolve the issue. Starting from your equation $(1)$, we have \begin{align} D_vf &= \frac{\partial f}{\partial r}v^r + \frac{\partial f}{\partial \phi}v^{\phi} \end{align} Next, if we do this from $(2)$, then we have \begin{align} \langle \nabla f, v\rangle &= \left\langle\frac{\partial f}{\partial r} e_r + \frac{1}{r}\frac{\partial f}{\partial \phi} e_{\phi},\,\,\, \xi^r e_r + \xi^{\phi} e_{\phi} \right\rangle \\\\ &= \frac{\partial f}{\partial r} \xi^r + \frac{1}{r}\frac{\partial f}{\partial \phi} \xi^{\phi} \end{align} where I used the fact that $\{e_r,e_{\phi}\}$ is an orthonormal basis, so the inner product is just the sum of the products of the coefficients. Finally, if we plug in $(**)$ above, we find that \begin{align} \langle \nabla f, v\rangle &= \frac{\partial f}{\partial r} \xi^r + \frac{1}{r}\frac{\partial f}{\partial \phi} \xi^{\phi} =\frac{\partial f}{\partial r}v^r + \frac{\partial f}{\partial \phi}v^{\phi} = D_vf \end{align} which is of course what we expect, since $\nabla f$ is DEFINED so as to make the equation $\langle \nabla f(p), v\rangle = D_vf(p) = df_p(v)$ work out.


Summary:

Whenever you speak of "components of a vector", you MUST ALWAYS keep track of which basis you're referring to. Very often in Differential geometry/Riemannian geometry, people work with the coordinate-induced basis vectors $\frac{\partial}{\partial x^i}$ (so when people write $v^i$ in this context, it's components relative to this basis), whereas in elementary vector calculus, people work with the normalized vectors $e_i$ (and because this is the only basis they use, when they write $v^i$, they mean the components relative to this basis).

Wikipedia from my experience isn't too consistent regarding the usage, and I recall seeing a single article with both uses simultaneously... which is of course very confusing. My suggestion for the future is to always be cautious of this distinction (there are also several other questions on this site where the entire confusion boils down to the difference between a normalized vs unnormalized basis).

Related Question