Compute derivative of a matrix with Hadamard product

matrix-calculus

I have the matrix $A$ as follows:

$$
(\mathbf{X} \odot \mathbf{A}) \mathbf{1}_p
$$

where $\odot$ is the Hadamard (elementwise) product, $\mathbf{X}\in \mathbb{R}^{n \times p}$, $\mathbf{A}\in \mathbb{R}^{n \times p}$, and $\mathbf{1}_p$ is a $p$-dimensional vecot of ones.

I want to take the derivative with respect to the matrix $\mathbf{A}$. I looked into Matirx-Cookbook, but couldn't find any useful information. Also, I checked Derivative of Hadamard product and couldn't really understand the answer.

Your help is appreciated

Best Answer

$ \def\H{{\cal H}} \def\e{\,e}\def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\D#1{\operatorname{Diag}\LR{#1}} \def\d#1{\operatorname{diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\gradLR#1#2{\LR{\frac{\p #1}{\p #2}}} $Since a vector-by-matrix gradient is a third-order tensor, the result does not fit neatly into standard matrix notation. In this case, a component-wise gradient is the simplest approach.

But first we need to know the gradient of a matrix with respect to its own components, i.e. $$\grad{A}{A_{ij}} = E_{ij} = \e_i\e_j^T$$ where $\{E_{ij}\}$ are single-entry matrices whose elements are all zero except for the $(i,j)$ element which equals $\o$. These matrices act as the standard matrix basis, just as the $\{e_k\}$ are the standard vector basis and whose elements are all zero except for a $\o$ at the $k^{th}$ component.

Write the tensor-valued gradient either as a set of vector-valued gradients (one for each component of $A$) $$\eqalign{ y &= \LR{X\odot A}\o \\ \grad{y}{A_{ij}} &= \LR{X\odot E_{ij}}\o = x_{ij} {E_{ij}\o} = x_{ij} \e_i\e_j^T\o = x_{ij}\,\e_i \\ }$$ or as a set of matrix-valued gradients (one for each component of $y$) $$\eqalign{ y &= \LR{X\odot A}\o \\ y_k &= \e_k^T\LR{X\odot A}\o \\ &= \LR{\e_k\o^T}:\LR{X\odot A} \\ &= \LR{X\odot\e_k\o^T}:A \\ &= \LR{E_{kk}X}:A \\ \grad{y_k}{A} &= E_{kk}X \\ }$$ where a colon denotes the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ and commutes with the Hadamard product $$\eqalign{ A:\LR{B\odot C} &= \LR{A\odot B}:C \\ }$$ If you need the full tensor-valued gradient, you can construct it using the dyadic product $(\star)$ and summing over the components $$\eqalign{ \grad{y}{A} &= \sum_{k}\e_k\star\gradLR{y_k}{A} \\ &= \sum_{i}\sum_{j}\gradLR{y}{A_{ij}}\star E_{ij} \\ }$$ One last way to write this is to define a tensor $$\H=\sum_k\e_k\star E_{kk}=\sum_k\e_k\star\e_k\star\e_k$$ with components $$\H_{ijk} = \begin{cases} \o\quad{\rm if}\;\;i=j=k \\ 0\quad{\rm otherwise} \\ \end{cases} $$ which extends the Kronecker delta symbol $(\delta_{ij})$ to a third-order tensor.

This gives us several equivalent ways to write the function $$\eqalign{ y \;=\; \LR{X\odot A}\o \;=\; \d{AX^T} \;=\; \H:\LR{AX^T} \;=\; \LR{\H X}:A }$$ Given the last form, the gradient calculation is trivial $$\eqalign{ \grad{y}{A} &= \H X \\ }$$

Related Question