[Math] Why does the summation dissapear when taking the derivative of the sum of squares

derivativespartial derivativepartial differential equations

Why is it that the derivative of the sum of squares of a vector, w:
\begin{eqnarray} \frac{\lambda}{2n}
\sum_w w^2,
\end{eqnarray}

turns out to be

\begin{eqnarray}
\frac{\lambda}{n} w
\end{eqnarray}

and not

\begin{eqnarray} \frac{\lambda}{n}
\sum_w w \;?
\end{eqnarray}

Basically as I see it, we've got

\begin{eqnarray}
w = [w_1, w_2, w_3 …]
\end{eqnarray}

\begin{eqnarray}
\frac{d}{dw} \frac{\lambda}{2n} \sum_w w^2 = \frac{\lambda}{2n}(\frac{\partial}{\partial w_1} \sum_w w^2 + \frac{\partial}{\partial w_2} \sum_w w^2 + \frac{\partial}{\partial w_3} \sum_w w^2 …)
\end{eqnarray}

\begin{eqnarray}
= \frac{\lambda}{n} (w_1 + w_2 + w_3 …)
\end{eqnarray}

\begin{eqnarray}
= \frac{\lambda}{n} \sum_w w
\end{eqnarray}

I'm following this ebook here (equations 87/88, which are basically the same as what I've written above). The main thing I don't understand is why we can eliminate the summation. Any math books or writeups on the subject would also be helpful.

Best Answer

If there are actually $m$ input variables, you write sum in the Equation $87$ in the ebook in the notation $$ \sum_{i=1}^m w_i^2, $$ and it can be viewed as a function of the $m$ variables $w_1, \ldots, w_m.$ The "derivative" in the ebook is a partial derivative, which deals with how the function value would change if you could slightly increase or decrease just one of the $m$ input variables while leaving all the others unchanged. The notation $\frac{\partial}{\partial w}$ in the ebook means the same thing as you would recognize in the $\frac{\partial}{\partial w_i},$ that is, it is a partial derivative with respect to one variable, but the ebook has chosen to let the letter $w$ by itself represent one of the $m$ variables rather than use a subscript.

The partial derivative of the sum of two functions is the sum of the partial derivatives, just like you are used to in the case of single-variable functions, but only when both partial derivatives are with respect to the same variable. The partial derivatives of different variables do not add up in the manner you imagine; and in any case, the ebook definitely means to take the partial derivative of one variable over the entire sum.

When we write $$ \frac{\partial}{\partial w_j} w_i^2, $$ the result is zero unless $i = j,$ because in a partial derivative $\frac{\partial}{\partial w_j}$ over the variables $w_1, \ldots, w_m,$ all the variables except $w_j$ act like constants. On the other hand, $$ \frac{\partial}{\partial w_j} w_j^2 = 2w_j, $$ because that describes how the function $w_j^2$ changes as we vary $w_j.$

To spell it out in gory detail, what you actually have is \begin{align} \frac{\partial}{\partial w_j} \frac{\lambda}{2n} \sum_{i=1}^m w_i^2 &= \frac{\lambda}{2n} \frac{\partial}{\partial w_j}\left( w_1^2 + \cdots + w_{j-1}^2 + w_j^2 + w_{j-1}^2 + \cdots + w_m^2 \right) \\ & = \frac{\lambda}{2n} \left(\frac{\partial}{\partial w_j}w_1^2 + \cdots + \frac{\partial}{\partial w_j}w_{j-1}^2 + \frac{\partial}{\partial w_j}w_j^2 + \frac{\partial}{\partial w_j}w_{j+1}^2 + \cdots + \frac{\partial}{\partial w_j}w_m^2 \right) \\ & = \frac{\lambda}{2n} \left(0 + \cdots + 0 + 2w_j + 0 + \cdots + 0\right) \\ & = \frac{\lambda}{n} w_j. \end{align}

Related Question