Derivative of summation across all elements of a matrix

derivativesmatricesmatrix-calculusmultivariable-calculusscalar-fields

Maybe a trivial question but my linear algebra / calculus is not very strong at the moment.

How do I take the derivative wrt a matrix of a summation across all indices, i.e., $\sum_i \sum_j A_{i,j}$?

I am trying to find $\nabla_H F$ where

$$F := \|X – WH\|_2^2 + \lambda \sum_i \sum_j H_{i,j}$$

I understand how the first part of the gradient is derived, but I'm not sure about the second part. I was trying to approach it by realizing

$$\sum_i \sum_j H_{i,j} = e^t H e$$

Where e represents the appropriately sized 1s vector. However, I don't really know how to differentiate that wrt H. Intuitively I would think it works out to $\lambda H$ but I'm not really sure how to step through it.

For context, I'm deriving an algorithm for L1-penalized Non-negative matrix factorization. Normally, the L1-norm does not have a defined gradient, but since all elements of H are $\geq 0$ the L1-norm is just a double summation of all indices. At least, that's what I'm thinking.

Best Answer

A convenient product notation for the trace is a colon, i.e. $\,A:B={\rm Tr}(A^TB)$

Write both parts of the objective function in terms of this trace product, then calculate the gradient. $$\eqalign{ F &= \lambda{\tt\large 11^T}:H + (WH-X):(WH-X) \\ dF &= \lambda{\tt\large 11^T}:dH + 2(WH-X):(W\,dH) \\ &= \Big(\lambda{\tt\large 11^T} + 2W^T(WH-X)\Big):dH \\ \frac{\partial F}{\partial H} &= \lambda{\tt\large 11^T} + 2W^T(WH-X) \\ }$$ Your intuition about using the all-ones vectors was indeed correct. The cyclic property of the trace permits the terms in a trace product to be rearranged in lots of ways, e.g. $$\eqalign{&A:BC = B^TA:C \;=\; AC^T:B\\&A:B \;=\; B:A}$$