Derivative of quadratic form with respect to orthogonal matrix for optimization of quadratic form

derivativesmatricesmatrix-calculusquadratic-forms

I have the quadratic form:

$$
f(P;Y,\Lambda) = Y^\top P\Lambda P^\top Y,
$$

where $\Lambda$ is a diagonal $p\times p$ matrix of real (positive) values, $P$ is an real orthogonal matrix, and $Y$ is a $p$-dimensional vector of reals. I want to find the derivative of $f(P;Y,\Lambda)$ with regard to the orthogonal matrix $P$. In general, I know how to do this for a symmetric (non-singular) matrix, but how do I do this if $P$ is orthogonal?

If I have a general case, if I am finding the P optimizing $f(P;Y,\Lambda)$, then I understand from the solution offered that I can use Lagrange multipliers? However, I am a little cofused how to go about this then, since then I have to take the derivative of:
$$
\frac{\partial}{\partial P} f(P;Y,\Lambda) + \gamma\frac{\partial}{\partial P} (PP^\top-I) +\delta \frac{\partial}{\partial P} (P^\top P-I)
$$

subject to $PP^\top=P^\top P=I$. Is there a more direct way?

Best Answer

$ \def\bbR#1{{\mathbb R}^{#1}} \def\l{\Lambda}\def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\bR#1{\big(#1\big)} \def\skew#1{\operatorname{skew}\LR{#1}} \def\cayley#1{\operatorname{cayley}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use an unconstrained matrix variable $U$ to construct the orthogonal matrix $P$ as follows $$\eqalign{ S &= \skew{U} \;\;\;\doteq\; \tfrac 12\LR{U-U^T} &\qiq S^T=-S \\ P &= \cayley{S} \;\doteq\; \LR{I+S}^{-1}\LR{I-S} &\qiq P^TP=I \\ dP &= -\LR{I+S}^{-1}dS\LR{I+P} &\qiq dS=\skew{dU} \\ }$$ Then use Golden_Ratio's gradient (with respect to $P$) to write the differential of the function, perform a change of variables from $dP\to dU,\,$ and recover the unconstrained gradient $$\eqalign{ df &= 2\LR{yy^TP\l}:dP \\ &= -2\LR{yy^TP\l}:\LR{\LR{I+S}^{-1}dS\LR{I+P}} \\ &= -2\LR{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}: {dS} \\ &= -2\LR{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}: \skew{dU} \\ &= -2\skew{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}:dU \\ \grad{f}{U} &= -2\skew{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}} \;\;\doteq\;\; G \\ }$$ Now $G$ can be used in your favorite unconstrained gradient-based algorithm to find the optimal $U_*$ after which it is trivial to calculate the corresponding optimal matrices $$\eqalign{ S_* &= \skew{U_*} \qiq P_* &= \LR{I+S_*}^{-1}\LR{I-S_*} \\\\ }$$


In the steps above, a colon is used to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A = \LR{A^TC}:B \\ A:\skew{B} &= \skew{A}:B \\ }$$