Difficulty understanding Chain rule in multivariable calculus

chain ruledifferential-geometrymanifoldssmooth-manifolds

I am having difficulty understanding the following form of the chain rule. Let $f$ be a function, whose arguments are vectors in $\mathbb{R}^n$. Then, I have read (in the book "Introduction to Manifolds" by Loring W. Tu, pg.6) that the chain rule takes the following form
$$\frac{d}{dt}f\Big(p^i+t(x^i-p^i)\Big)=\sum_{i=1}^n
(x^i-p^i)\frac{\partial}{\partial x^i}f\Big(p^i+t(x^i-p^i)\Big)$$

where $t$ is a scalar quantity taking the values from 0 to 1.

If I were to apply the chain rule, I would set, for example, $u^i=p^i+t(x^i-p^i)$ and then write down that
$$\frac{d}{dt}=\sum_{i=1}^n\frac{\partial u^i}{\partial t}\frac{\partial}{\partial u^i}$$
and hence
$$\frac{d}{dt}f\Big(p^i+t(x^i-p^i)\Big)=\sum_{i=1}^n\frac{\partial u^i}{\partial t}\frac{\partial}{\partial u^i}f\Big(p^i+t(x^i-p^i)\Big)\tag{*}\label{*}$$
which is not the same, according to my understanding, to the one I find in the book, since I $f$ in my case is differentiated with respect to the vector $u^i=p^i+t(x^i-p^i)$, whereas in the book's case the differentiation of $f$ is simply with respect to $x^i$. So, my guess is that the two differentiations must be equivalent or at least somehow related. Can someone explain why??

P.S.: I have seen Explanation of use of chain rule, in which the answer claims that – according to my definition of $u^i$ – the answer should be given as in $\eqref{*}$, but I am not sure whether or not I am convinced as to why something like that should hold?

Best Answer

Let us first observe that Tu writes $$\frac{d}{dt}f\Big(p+t(x-p)\Big)=\sum_{i=1}^n (x^i-p^i)\frac{\partial}{\partial x^i}f\Big(p+t(x-p)\Big)$$ which is not the same formula as in your question.

I think your difficulty in understanding the chain rule in multivariable calculus is just a notational issue.

Tu considers $f : U \to \mathbb R$, where $U \subset \mathbb R^n$ is open and star-shaped with respect to a point $p \in U$. This function has partial derivatives $D^if = \frac{\partial f}{\partial x^i}$ which are again functions $U \to \mathbb R$. We shall write $\frac{\partial f}{\partial x^i} \mid_\xi$ for the value $\frac{\partial f}{\partial x^i} (\xi)$ of this function at $\xi \in U$. This is the standard notation, the symbol $\partial x^i$ is used because we usually write $f$ as a function $f(x^1,\ldots,x^n)$ of $n$ variables $x^1, \ldots, x^n$. But the naming of the the variables is arbitrary, you can equally well view $f$ as a function of $n$ variables $u^1,\ldots,u^n$ which gives $D^if = \frac{\partial f}{\partial u^i}$ instead of $\frac{\partial f}{\partial x^i}$. If there are three variables, it is also common to write $f(x,y,z)$ and $D^1f = \frac{\partial f}{\partial x}$, $D^2f = \frac{\partial f}{\partial y}$ and $D^3f = \frac{\partial f}{\partial z}$.

In the one-dimensional case we usually write $\frac{\partial f}{\partial x} = \frac{d f}{dx}$ provided we regard $f$ as a function $f(x)$ of the variable $x$. But of course we can also write $f$ in the form $f(t)$ which erforces to write $\frac{d f}{dt}$ instead of $\frac{d f}{dx}$.

In other words, the naming of the variables in the domain of $f$ correponds $1$-$1$ to the naming of the "denominators" of the partial derivatives $D^if$. If the $i$-th variable is named $v$, then the associated partial derivative is named $\frac{\partial f}{\partial v}$.

Whatever you personal notational favourite may be, the point is that $D^if$ is the partial derivative of $f$ with respect to the $i$-th coordinate.

Let us come to Tu.

Define $u : (-\epsilon, 1 + \epsilon) \to \mathbb R^n, u(t) = p + t(x-p)$. For sufficiently small $\epsilon > 0$ the image of $u$ is contained in $U$ and we shall write $u : (-\epsilon, 1 + \epsilon) \to U$. We can apply the chain rule to $f \circ u : (-\epsilon, 1 + \epsilon) \to \mathbb R$. We get $$\frac{d(f \circ u)}{dt}\mid_{\tau} = Jf \mid_{u(\tau)} \circ Ju \mid_{\tau}$$where $Jf \mid_{u(\tau)}$ is the Jacobian matric of $f$ at $u(\tau)$ and $Ju \mid_{\tau}$ is the Jacobian matric of $u$ at $\tau$. We have $$Ju \mid_\tau = \begin{pmatrix}\frac{d u^1}{dt}\mid_\tau \\ \ldots \\ \frac{d u^n}{dt}\mid_\tau \end{pmatrix} = \begin{pmatrix}x^1 - p^1 \\ \ldots \\ x^n - p^n \end{pmatrix} $$ $$Jf \mid_{u(\tau)} = \begin{pmatrix}D^1f\mid_{u(\tau)} & \ldots & D^nf\mid_{u(\tau)} \end{pmatrix} = \begin{pmatrix}\frac{\partial f}{\partial x^1}\mid_{u(\tau)} & \ldots & \frac{\partial f}{\partial x^n}\mid_{u(\tau)} \end{pmatrix}$$ Multiplying the matrices yields $$\frac{d(f \circ u)}{dt}\mid_{\tau} = \sum_{i=1}^n (x^i-p^i)D^if\mid_{u(\tau)} = \sum_{i=1}^n (x^i-p^i)\frac{\partial f}{\partial x^i}\mid_{u(\tau)} .$$ This is exactly the same as your formula (*) since $\frac{d u^i}{dt} = x^i - p^i$; the only formal difference is that you denote the coordinates of points of $U$ by $u^i$ instead of $x^i$. But I would not do that in the present case. As you define the $u^i$, they are functions of $t$ (involving parameters $x, p$ with do not depend on $t$), and the notation $\frac{\partial f}{\partial u^i}$ may be confusing though formally correct.

Related Question