Differentiating the trace of $X^T A X$

derivativeslagrange multipliermatricesoptimizationtrace

I'm solving an optimization problem whose Lagrangian is:

$$\mathcal L(X) = \operatorname{trace}(X^TAX)-\operatorname{trace}(\Lambda(cX^TX-I))$$

where $\Lambda$ is a diagonal matrix with the Lagrange multipliers.

I want to take $\frac{\partial \mathcal L(X)}{\partial X}=0$ and solve, but I don't know how to differentiate the trace. The trace operator being used in this way is quite foreign to me; how does one go about (1) finding the derivative and (2) solving for values of a matrix-valued function?

Best Answer

You can use this guide - Practical Guide to Matrix Calculus for Deep Learning and the fact $$\text{trace}(A^TB)=A\cdot B=\sum_{i,j}A_{i,j}B_{i,j}$$ Now using the rules [mostly (6), but also (8), (9) (10) and so on] of matrix differential calculus from the guide we can find that the differential is (assuming $c$ is scalar) $$d\mathcal L(X)=d(AT\cdot X-\Lambda^T\cdot cX^TX-I\cdot\Lambda)=\\=AT\cdot dX-\Lambda^T\cdot cd(X^TX)=AT\cdot dX-\Lambda^T\cdot(dX^TcX+cX^TdX)=\\=AT\cdot dX-\Lambda^TX^Tc\cdot dX^T-cX\Lambda^T\cdot dX=AT\cdot dX-X\Lambda c\cdot dX-X\Lambda^Tc\cdot dX=\\=[AT-Xc(\Lambda+\Lambda^T)]\cdot dX$$ Using the rule (17) from the guide we see that the needed gradient is $$\frac{\partial \mathcal L(X)}{\partial X}=AT-Xc(\Lambda+\Lambda^T)$$ Equating to zero we get $$AT-Xc(\Lambda+\Lambda^T)=0$$ $$X(\Lambda+\Lambda^T)=\frac{1}{c}AT$$ Now solutions to this equation depends on $(\Lambda+\Lambda^T)$ - if it is invertible we can set $$X=\frac{1}{c}AT(\Lambda+\Lambda^T)^{-1}$$

Related Question