Why index notation of summation changes during derivative in chain rule

chain rulederivativespartial derivativesummation

I am trying to understand a derivative example from a class note. As shown in the picture, it's working out the derivative of a $log(exp())$ function with respect to $V_c$. When applying the chain rule, the index of summation $\sum\limits_{w=1}^V$ changed from $w$ to $x$ and the note says "Important to change index". I'm a bit confused why we have to change the index. Thanks.
Picture to the example

Best Answer

It's not apparent from that image alone why it should be important to change the index. My guess would be that they later go on to write this as

$$\sum_{x=1}^Va_xu_x$$

with weights

$$ a_x=\frac{\exp\left(u_x^\top V_c\right)}{\sum_{w=1}^V\exp\left(u_w^\top V_c\right)}\;, $$

and in that case, when the index is no longer just a bound summation index but appears as a free variable, it's important not to use the same letter for it as for a summation index in the same expression. As long as you just have two summations in the same expression and they're not nested, it's not a problem if they use the same summation index.

Related Solutions

Probability – Understanding the Chain Rule in Probability Theory

$$p(x,y|z) = \frac{p(x,y,z)}{p(z)} = \frac{p(x|y,z)p(y,z)}{p(z)} = p(x|y,z)p(y|z)$$

On the first step we use the definition of conditional probability. On the second step we use the same definition on the numerator to convert the joint probability $p(x,y,z)$ into a conditional $p(x|y,z)$ and a joint $p(y,z)$. Finally, we divide $p(y,z)$ by $p(z)$ applying once again the definition of conditional probability, and we obtain the result.

Another way of looking at it is that you can just ignore variables that are always on the right side of the conditional sign. In that case the expression is just the usual conditional probability:

$$p(x,y) = p(x|y)p(y)$$

You simply condition all of these probabilities on $z$ and you get your original formula.

[Math] Intuition of multivariable chain rule

The problem with intuition about cancelling differentials, it isn't safe. And yet, the method of differentials is stupidly successful.

Let me give a standard example of intuitions downfall. First, since partials cancel, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = 1$$ except, it doesn't. Actually, with the right interpretation, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = -1.$$ In particular, we assume $x,y,z$ are related by some level function $F(x,y,z)=0$ then $dF = F_xdx+F_ydy+F_zdz$ thus $$ \frac{\partial z}{\partial y} = \frac{dz}{dy}\bigg{|}_{dx=0} = -\frac{F_y}{F_z}$$ with more words, if we consider $z$ as a function of $x,y$ then the partial derivative of $z$ whilst holding $x$ fixed is $-F_y/F_z$. Notice, I simply take the total differential of $F$ and solve for $dz/dy$ while setting $dx=0$. This is an example of how the differential notation is naively successful (because, careful application of the implicit function theorem yields the same outcome). Likewise, intuitive calculation with $dx,dy,dz$ yields $$ \frac{\partial y}{\partial x} = \frac{dy}{dx}\bigg{|}_{dz=0} = -\frac{F_x}{F_y}$$ $$ \frac{\partial x}{\partial z} = \frac{dx}{dz}\bigg{|}_{dy=0} = -\frac{F_z}{F_x}$$ Thus, $$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = \left(-\frac{F_y}{F_z}\right)\left(-\frac{F_x}{F_y}\right)\left(-\frac{F_z}{F_x}\right) = -1.$$

Getting back to your posed question. Why are there sums of derivatives? Well, in short, because the multivariate function can change in all of its arguments. As the derivative is a linear approximation to the change in the function we have little hope except to see formulas formed from sums of all the possible things which can change the outcome. This is the multivariate chain rule. It accounts for each entry in an entirely symmetrical manner. Ok, these sort of explainations don't settle well with me. The real answer in my estimation is matrix multiplication. The chain-rules really fall out of multiplication of Jacobian matrices which in turn come from the chain-rule in its pure form $D(F \circ G) = DF \circ DG$. But, perhaps this isn't intuition. That said, it is my intuition.

I'll add a little example to explain how the matrix multiplication works together with the Jacobian matrix to capture the chain rule. Suppose $\vec{X}: \mathbb{R}^2_{uv} \rightarrow \mathbb{R}^3_{xyz}$ and $\vec{F} = \langle P, Q, R \rangle : \mathbb{R}^3_{xyz} \rightarrow \mathbb{R}^3$. Here I use the notation $\mathbb{R}^2_{uv}$ to indicate $u,v$ serve as the coordinates. Here you can think of $\vec{X}$ as a parametrization of a surface and $\vec{F}$ as a vector field in three dimensional space. The composition $\vec{F} \circ \vec{X}$ is commonly considered in the calculation of flux of $\vec{F}$ through the surface parametrized by $\vec{X}$. In this case, the Jacobian of $\vec{X}$ is given by $$ J_{\vec{X}} = \left[ \frac{\partial \vec{X}}{\partial u} |\frac{\partial \vec{X}}{\partial v}\right] = \left[\begin{array}{cc} \partial_u x & \partial_v x \\ \partial_u y & \partial_v y \\ \partial_u z & \partial_v z \end{array} \right]$$ and the Jacobian of $\vec{F}$ is given by $$ J_{\vec{F}} = \left[ \frac{\partial \vec{F}}{\partial x}| \frac{\partial \vec{F}}{\partial y}| \frac{\partial \vec{F}}{\partial z} \right] = \left[ \begin{array}{ccc} \partial_x P & \partial_y P & \partial_z P \\ \partial_x Q & \partial_y Q & \partial_z Q \\ \partial_x R & \partial_y R & \partial_z R \\ \end{array} \right]$$ Setting $\vec{G} = \vec{F} \circ \vec{X}$ we find from the matrix form of the chain rule that: (suppressing point dependence) \begin{align} J_{\vec{G}} &= J_{\vec{F}}J_{\vec{X}} \\ &= \left[ \begin{array}{ccc} \partial_x P & \partial_y P & \partial_z P \\ \partial_x Q & \partial_y Q & \partial_z Q \\ \partial_x R & \partial_y R & \partial_z R \\ \end{array} \right]\left[\begin{array}{cc} \partial_u x & \partial_v x \\ \partial_u y & \partial_v y \\ \partial_u z & \partial_v z \end{array} \right] \\ &= \left[\begin{array}{c|c} \partial_x P\partial_u x +\partial_y P \partial_u y + \partial_z P\partial_u z &\partial_x P\partial_v x +\partial_y P \partial_v y + \partial_z P\partial_v z \\ \partial_x Q\partial_u x +\partial_y Q \partial_u y + \partial_z Q\partial_u z &\partial_x Q\partial_v x +\partial_y Q \partial_v y + \partial_z Q\partial_v z \\ \partial_x R\partial_u x +\partial_y R \partial_u y + \partial_z R\partial_u z &\partial_x R\partial_v x +\partial_y R \partial_v y + \partial_z R\partial_v z \end{array} \right] \end{align} For example, in the $(1,1)$ entry we read off: $$ \frac{\partial G^1}{\partial u} = \frac{\partial}{\partial u} \left[P(x(u,v), y(u,v), z(u,v))\right] = \frac{\partial P}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial P}{\partial y}\frac{\partial y}{\partial u} + \frac{\partial P}{\partial z}\frac{\partial z}{\partial u} $$ Notice the matrix $J_{\vec{G}}$ contains all $6$ interesting chain rules involving composition of the component functions $P,Q,R$ of $\vec{F}$ composed with the component functions $x,y,z$ of $u,v$.

Best Answer

Related Solutions

Probability – Understanding the Chain Rule in Probability Theory

[Math] Intuition of multivariable chain rule

Related Question