Summation as a matrix multiplication

linear algebramatricesmatrix-calculus

Let $X$ be $n \times k$ matrix , $D$ is an Euclidean distance matrix $n \times n$ ($d_{ij} = \|x_i-x_j\|$) and $D' $ is just $n\times n$ matrix of realnumbers. Then i want to find a gradient for a function $L(X) = \frac{1}{\sum_{i<j}d'_{ij}}\sum_{i < j}\frac{(d_{ij}-d'_{ij})^2}{d'_{ij}}$ which is $\frac{\partial L}{\partial x_{ik}} = \frac{-2}{\sum_{i<j}d'_{ij}}\sum_{i < j} \frac{d_{ij}-d'_{ij}}{d'_{ij}d_{ij}}(x_{ik}-x_{jk})$. My question: Is there a clever way to write the gradient as a matrix multiplication?

Best Answer

$ \def\L{{L}}\def\o{{\large\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\BR#1{\Bigl(#1\Bigr)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\Lapl#1{\operatorname{Lap}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Use the distance matrix $(D)$, the identity matrix $(I)$, and the all-ones matrix $(J)$ to define $$\eqalign{ F &= J-I \qiq F\odot F = F\\ A &= D' \\ B &= (D-A)\odot(D-A)\oslash A \\ dB &= 2(D-A)\odot dD\oslash A \\ M &= 2(F\oslash D)\odot(D-A)\oslash A \\ S &= M+M^T \\ }$$ where $(\odot,\oslash)$ denote Hadamard multiplication and division, and $dB$ is the differential of $B$.

Let's also define the Frobenius product, which is a very convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$ and it also commutes with the Hadamard product $$\eqalign{ A:(B\odot C) = (A\odot B):C \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\\\ }$$


Using the above notation, one can write the distance matrix (and its differential) in terms of the $X$ matrix $$\eqalign{ D\odot D &= \LR{X\odot X}J^T + J\LR{X\odot X}^T - 2XX^T \\ 2D\odot dD &= \LR{2X\odot dX}J^T + J\LR{2X\odot dX}^T - 2\,dX\,X^T - 2X\,dX^T \\ D\odot dD &= \LR{X\odot dX}J^T + J\LR{X\odot dX}^T - \,dX\,X^T - X\,dX^T \\ }$$ One can also write the objective function and calculate its differential and gradient $$\eqalign{ \L &= F:B \\ d\L &= F:dB \\ &= F:\BR{2(D-A)\odot dD\oslash A} \\ &= \BR{2F\odot(D-A)\oslash A}:dD \\ &= \BR{2F\c{\oslash D}\odot(D-A)\oslash A}:(dD\c{\odot D}) \\ &= M:(D\odot dD) \\ &= M:\BR{\LR{X\odot dX}J^T + J\LR{X\odot dX}^T - \,dX\,X^T - X\,dX^T} \\ &= (MJ\odot X):dX + (M^TJ\odot X):dX - MX:dX - M^TX:dX \\ &= (SJ\odot X):dX - SX:dX \\ &= \BR{SJ\odot X - SX}:dX \\ &= \BR{\Diag{S\o}-S}X:dX \\ &= \Lapl{S}\,X:dX \\ \grad{\L}{X} &= \Lapl{S}\,X \\ }$$ where $\o$ is the all-ones vector and $\Lapl{S}$ is the Laplacian of $S$.

NB:$\,$ Since both $F$ and $D$ have zeros along their diagonals, define the quotient $(F\oslash D)$ such that it also has zeros on its diagonal. This means that $M$ and $S$ have zeros on their diagonals as well.

With this modified definition of Hadamard division, one can alternatively define $\,F = D\oslash D$

The zeros on the diagonal of $F$ satisfy the $(j\ne i)$ restriction of the summation within the $(F:B)$ product, since the diagonal terms of $B$ are being summed with a coefficient of zero.

The matrices $F$ and $D$ are symmetric. One would assume that $A=D'$ is also symmetric, but that was not explicitly stated in the problem. If it is indeed symmetric, then that makes $M$ symmetric and one can substitute $S=2M$ to simplify the solution further.

The normalization constant was omitted. It is a simple scalar value $\,\lambda=(F:A)^{-1}$
The above gradient should be multiplied by $\lambda$ if it is important.