Explanation of notation used in chain rule

chain rulemultivariable-calculusnotationpartial derivativereal-analysis

Let $U\subseteq\mathbb{R}^{n}$ and $V\subseteq\mathbb{R}^{m}$ be open. Let $f:U\subseteq\mathbb{R}^{n}\to\mathbb{R}$ and $g_{1},g_{2},\dotsc,g_{n}\colon V\subseteq\mathbb{R}^{m}\to\mathbb{R}$ be $n$ functions such that:
\begin{equation*}
(g_{1}(x),g_{2}(x),\dotsc,g_{n}(x))\in U, \quad\forall\, x\in V.
\end{equation*}

Furthermore let $x_{0}\in V$, and let $j$ be a number in $\{1,\dotsc,n\}$. Assume that $f$ is differentiable at $y_{0}=(g_{1}(x_{0}),g_{2}(x_{0}),\dotsc,g_{n}(x_{0}))$ and that the partial derivative $\frac{\partial g_{i}}{\partial x_{j}}(x_{0})$ exists for all $i=1,\dotsc,n$.

Then, the partial derivative of $f\circ (g_{1},g_{2},\dotsc,g_{n})$ exists w.r.t. the $j$th coordinate of $x$, and:
\begin{equation*}
\frac{\partial (f\circ g)}{\partial x_{j}}(x_{0})=\sum_{i=1}^{n}\frac{\partial f}{\partial y_{i}}(y_{0})\frac{\partial g_{i}}{\partial x_{j}}(x_{0}).
\end{equation*}

My question: Is it true that $y_{i}=g_{i}(x_{0})$?

I know that it is a dumb question, but I am just curious. Thanks in advance.

Best Answer

@TedShifrin has already provided the answer in the comments under the question, but I felt that a bit of elaboration couldn't hurt.


No, it is not true that $y_i = g_i(x_0)$.

Just as $x_j$ is used to denote the $j$th coordinate of $V \subseteq \mathbb{R}^m$, here $y_i$ denotes the $i$th coordinate of $U \subseteq \mathbb{R}^n$. So, $f$ is viewed as a function of $y_1,\dotsc,y_n$, that is, $f \equiv f(y_1,\dotsc,y_n)$.

Thus, $\displaystyle \frac{\partial f}{\partial y_i}(y_0)$ denotes the partial derivative of $f$ w.r.t. the $i$th coordinate, evaluated at the point $y_0$.


This notation for partial derivatives can indeed get a bit confusing, but it is standard so it's good to get used to it. To elaborate, note that there is nothing special about using $x$'s and $y$'s to denote the coordinates of $\mathbb{R}^m$ and $\mathbb{R}^n$, respectively. One can just as well choose to use $a$'s and $b$'s, except that this might be uncommon and therefore confusing to the reader.

In fact, it is only the indices $i$ and $j$ that matter — these specify the positions of the variable with respect to which the derivatives need to be taken. The choice of $x$'s and $y$'s was arbitrary. So, an alternative notation for $\displaystyle \frac{\partial f}{\partial y_i}$ could simply be $D_i f$.


This is in fact how Spivak does it in Calculus on Manifolds. There is an interesting section in this book titled Notation on pages 44–45, which I reproduce partially below:

The partial derivative $D_1 f(x,y,z)$ is denoted, among devotees of classical notation, by $$ \frac{\partial f(x,y,z)}{\partial x} \quad \text{or} \quad \frac{\partial f}{\partial x} \quad \text{or} \quad \frac{\partial f}{\partial x} (x,y,z) \quad \text{or} \quad \frac{\partial}{\partial x} f(x,y,z) $$ or any other convenient similar symbol. This notation forces one to write $$ \frac{\partial f}{\partial u} (u,v,w) $$ for $D_1 f(u,v,w)$, although the symbol $$ \frac{\partial f}{\partial x} \bigg|_{(x,y,z)=(u,v,w)} \quad \text{or} \quad \frac{\partial f(x,y,z)}{\partial x} (u,v,w) $$ or something similar may be used (and must be used for an expression like $D_1 f(7,3,2)$). Similar notation is used for $D_2 f$ and $D_3 f$. Higher-order derivatives are denoted by symbols like $$ D_2 D_1 f(x,y,z) = \frac{\partial^2 f(x,y,z)}{\partial y \partial x}. $$ When $f \colon \mathbf{R} \to \mathbf{R}$, the symbol $\partial$ automatically reverts to $d$; thus $$ \frac{d \sin x}{dx}, \quad \text{not} \frac{\partial \sin x}{\partial x}. $$ The mere statement of Theorem 2-2 in classical notation requires the introduction of irrelevant letters. The usual evaluation for $D_1(f \circ (g,h))$ runs as follows:

If $f(u,v)$ is a function and $u = g(x,y)$ and $v = h(x,y)$, then $$ \frac{\partial f(g(x,y),h(x,y))}{\partial x} = \frac{\partial f(u,v)}{\partial u}\frac{\partial u}{\partial x} + \frac{\partial f(u,v)}{\partial v} \frac{\partial v}{\partial x}. $$ [The symbol $\partial u/\partial x$ means $\partial/\partial x\, g(x,y)$ and $\partial/\partial u\, f(u,v)$ means $D_1 f(u,v) = D_1 f(g(x,y),h(x,y))$.] This equation is often written simply $$ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial f}{\partial v} \frac{\partial v}{\partial x}. $$ Note that $f$ means something different on the two sides of the equation!

Theorem 2-2 mentioned in the extract above is in fact the Chain Rule (stated on page 19). The version of the Chain Rule that you mention in the question is stated as Theorem 2-9 (on page 32):

2-9 Theorem. Let $g_1,\dotsc,g_m \colon \mathbf{R}^n \to \mathbf{R}$ be continuously differentiable at $a$, and let $f \colon \mathbf{R}^m \to \mathbf{R}$ be differentiable at $(g_1(a),\dotsc,g_m(a))$. Define the function $F \colon \mathbf{R}^n \to \mathbf{R}$ by $F(x) = f(g_1(x),\dotsc,g_m(x))$. Then $$ D_i F(a) = \sum_{j = 1}^m D_j f(g_1(a),\dotsc,g_m(a)) \cdot D_i g_j(a). $$

Note how there are indeed no extraneous symbols used in the statement of the theorem!


Reference: Michael Spivak, Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus. Addison-Wesley Publishing Company, Reading, Massachusetts, 1965.

Related Question