[Math] Expressing Quadratic Form for a Matrix Inverse using Summation Symbols

linear algebra

I have an equation and I want to know why it is true. It can be found in Boyd's Convex Optimization book, page 82, line 11-12. It is example 3.9.

\begin{equation}
b^T W A (A^T W A)^{-1} A^T W b = \displaystyle\sum\limits_{i=1}^n w_i^2 b_i^2 a_i^T (\displaystyle\sum\limits_{j=1}^n w_j a_j a_j^T)^{-1} a_i
\end{equation}

W is a diagonal matrix with size nxn. A is any matrix with size nxm, we designate the term $a_i^T$ to be the i-th row of A. b is a vector with size nx1. So the left and right sides of the equation is a number.

I tried simplifying $b^T W A$ and $A^T W b$, so:
\begin{equation}
b^T W A = b_1 w_1 a_1^T + b_2 w_2 a_2^T + \cdots + b_n w_n a_n^T.
\end{equation}

$$A^T w b = (a_1^T)^T w_1 b_1 + (a_2^T)^T w_2 b_2 + \cdots + (a_n^T)^T w_n b_n$$
$$= a_1 w_1 b_1 + a_2 w_2 b_2 + \cdots + a_n w_n b_n$$
Note that a_i means the i-th column of A.

Also, the inverse can be expressed as such:

$$(A^T W A)^{-1} = (w_1 a_1 a_1^T + w_2 a_2 a_2^T + \cdots + w_n a_n a_n^T)^{-1} $$
$$= (\displaystyle\sum\limits_{j=1}^n w_j a_j a_j^T)^{-1}$$

Then, the question is how do I move the coefficients around so that they match the stated expression? That is, why is this true:

$$(b_1 w_1 a_1^T + b_2 w_2 a_2^T + \cdots b_n w_n a_n^T) (\displaystyle\sum\limits_{j=1}^n w_j a_j a_j^T)^{-1} (a_1 w_1 b_1 + a_2 w_2 b_2 + \cdots + a_n w_n b_n) $$
$$= \displaystyle\sum\limits_{i=1}^n w_i^2 b_i^2 a_i^T (\displaystyle\sum\limits_{j=1}^n w_j a_j a_j^T)^{-1} a_i$$

Thanks.

Best Answer

I think this is a mistake in the book. The equation is saying that the symmetric matrix between $b^\mathrm{T}$ and $b$ on the left-hand side is diagonal; else there would have to be cross-terms $b_i b_k$ with $i\neq k$ on the right-hand side. But unless there are further conditions on $A$ and $W$, there's no reason for that matrix to be diagonal; in fact you can easily find examples where it isn't, with $W$ the identity and $A$ some small non-square matrix.

More generally, though, to bring the $b$s and $W$s together in an expression like the one on the left-hand side, you can use the cyclic invariance of the trace:

$$\begin{eqnarray} b^T W A (A^T W A)^{-1} A^T W b&=& \mathrm{Tr}\left[b^T W A (A^T W A)^{-1} A^T W b\right]\\\ &=&\mathrm{Tr}\left[W b b^T W A (A^T W A)^{-1} A^T\right]\;. \end{eqnarray}$$

Related Question