Differentiating scalar, product of matrix and Hadamard multiplications, applying product and chain rule

chain rulehadamard-productlinear algebramatrix-calculus

Suppose we need to differentiate a scalar which is the product of several matrix multiplications and Hadamard (elementwise) products between matrices.
$$
Y= (A(B(XC)\circ D)\circ E)F$$

$$\frac{\partial Y}{\partial X}=?$$

Let the dimensions be A: (1*a), B: (a*b), X: (b*1), C(1*e), D(b* e), E(a*e), F(e*1)

So, Y is a scalar and we are differentiating it with respect vector X. Therefore, we expect the derivative to be a b*1 vector, like X.

i) First of all, unless we vectorise all matrices, I don't think we can rearrange the Hadamard product as matrix multiplication, which is quite inconvenient in this case.

I am trying to apply product rule and chain rule in order to figure it out but I am running into several problems.

ii) I am not sure how the chain rule can work in this case, because, when breaking down the function, we run into differentiation of Matrix over vector (e.g. $(B(XC)\circ D)$ is an a*e matrix)

iii) Moreover, I am not sure how the dimensions of the matrices can match after the differentiation (i.e. after $X$ is removed). Some suggest using the Kronecker Product, but I do not see how this can result into a b*1 vector in the end.

So, if someone can calculate the derivative here and show us how to get to a vector matching the dimensions of X, it will be very much appreciated.

Best Answer

First some notation. Denote the trace/Frobenius product with a colon, i.e. $$A:B = {\rm Tr}(A^TB)$$ a matrix with an uppercase letter, a vector with a lowercase letter, and a scalar with a Greek letter.

For typing convenience, define the column vectors $$\eqalign{ a &= A^T, \quad c &= C^T, \quad f &= F, \quad x &= X \\ }$$ and the matrices $$\eqalign{ H &= B^T\big(E\odot af^T\big), \quad K &= H\odot D \\ }$$ Rewrite the function in terms of these new variables. $$\eqalign{ \gamma &= a^T\big(B(xc^T\odot D)\odot E\big)f \\ &= a:\big(B(xc^T\odot D)\odot E\big)f \\ &= af^T:\big(B(xc^T\odot D)\odot E\big) \\ &= (E\odot af^T):B(xc^T\odot D) \\ &= H:(xc^T\odot D) \\ &= K:xc^T \\ &= Kc:x \\ }$$ Now it's a simple matter to find the differential and gradient. $$\eqalign{ d\gamma &= Kc:dx \\ \frac{\partial \gamma}{\partial x} &= Kc \\ }$$ NB:   The properties of the trace allow Frobenius products to be rearranged in a variety of ways. $$\eqalign{ A:B &= A^T:B^T \\ A:BC &= AC^T:B \;=\; B^TA:C \\ }$$ Also, Hadamard and Frobenius products commute with themselves and each other. $$\eqalign{ A:B &= B:A \\ A\odot B &= B\odot A \\ C:A\odot B &= C\odot A:B \\ }$$

Update

There was a question in the comments about the related vector-valued problem $$y = A\big(B(xc^T\odot D)\odot E\big)f$$ Even for this modified problem, the chain rule remains impractical. The real difficulty with both problems is the presence of the Hadamard products $-$ they make things awkward.

Nonetheless, here is how to calculate the gradient of the modified problem.

First, define some new variables. $$\eqalign{ C &= {\rm Diag}(c), \quad X = {\rm Diag}(x)\; \implies\;B(xc^T\odot D) = B(XDC) \\ E &= \sum_k \sigma_ku_kv_k^T \quad {\rm \{SVD\}} \\ W_k &= {\rm Diag}(\sigma_ku_k), \; V_k = {\rm Diag}(v_k) \implies E\odot Z = \sum_k W_k Z V_k \\ }$$ Then rewrite the function. $$\eqalign{ y &= A(E\odot BXDC)\,f \\ &= \sum_k A(W_kBXDCV_k)f \\ &= \sum_k {\rm vec}\Big(AW_kB\quad{\rm Diag}(x)\quad DCV_kf\Big) \\ &= \sum_k {\rm vec}\Big(\alpha_k\,{\rm Diag}(x)\,\beta_k\Big) \\ &= Jx\\ }$$ where this result provides a closed-form expression for the $J$-matrix. $$\eqalign{ J &= \sum_k (\beta_k^T\otimes {\tt 1})\odot({\tt 1}\otimes \alpha_k) \\ }$$ Having rewritten the problem in this form, the gradient (i.e. Jacobian) is trivial. $$\eqalign{ \frac{\partial y}{\partial x} &= J \\ }$$

Related Question