[Math] Inverting the sum of a Dense and Diagonal matrix

matrices

So, lets assume that we have two matrices given to us:

$A \in \mathbb{\Re}^{M \times K}$: just some dense, real, arbitrary valued matrix.

$L \in \Re^{K \times K}$: A positive diagonal matrix

Now, lets say we have the following situation:

$\left( A^T A + L \right)^{-1}$

Is it possible to rephrase this statement in such a way as to make use of the knowledge of
$\left( A^T A \right)^{-1}$ so as to avoid this addition inside the inverse?

I'm sorry if this is a little vague 😛 Just running into an analytical hiccup and trying to figure out how to break this statement apart.

Edit: Would it be possible to analyze this expression using an eigen decomposition? That is, if we say

$A^T A = Q \Lambda Q^{-1}$

Then, if $L$ is diagonal:

$\left( A^T A + L \right) = Q \left( \Lambda + L \right) Q^{-1}$

Which would make

$\left( A^T A + L \right)^{-1} = Q \left( \Lambda + L \right)^{-1} Q^{-1}$

Which might be usable somehow because $\left( \Lambda + L \right)$ is a diagonal matrix and allowing you to calculate the inverse directly by taking $1$ over the diagonal entries, right? Or is this completely off?

Edit 2: @Rcollyer So, I changed around my formulation entirely by only looking at rank 1 updates of $A^T A$ and was able to use the Sherman-Morrison (a form of the Woodbury) formula to break it apart and get at what I wanted 😛

Best Answer

The only method I am aware of is the Sherman–Morrison formula for performing a rank-1 update of an inverse, or its generalization, the Woodbury matrix identity for performing rank-k updates. In both cases, their advantage lies in the fact that $L$ does not have full rank. If $L$ has full rank, these methods won't provide you with enough speed up to make them worth it over just performing an LU decomposition on $A^{\mathrm{T}}A + L$ directly.

I ran into a similar problem where I potentially had several hundred to several thousand matrices to invert where the portion of the matrix sum that was changing had full rank, and LU decomposition was the best I was able to come up with.

Edit: Hunting through my stack of papers, I came across a good review of these techniques.

Edit (2nd question): Your second approach doesn't work, in general. The error is in the second equation, it should be

$$ \begin{align} A^\mathrm{T}A + L &= Q \Lambda Q^{-1} + L \\ &= Q \Lambda Q^{-1} + Q Q^{-1} L Q Q^{-1} \\ &= Q( \Lambda + Q^{-1} L Q )Q^{-1} \end{align} $$

In general, $Q^{-1} L Q$ is not diagonal, unless $L$ commutes with $Q$ which usually only occurs when it is proportional to $I$.

Edit (idyl speculation): Upon reviewing the second question, I though of a way that may allow you to use that method. As stated previously, the only way for the second method to work is for $L$ to commute with $Q$. Since $L$ is diagonal, we can rewrite $L$ as $\alpha I + N$, where $\alpha$ is the most common diagonal element of $L$. Then, the rank of $N$ will be less than the rank of $A^\mathrm{T}A$. Thus, we can calculate $(A^\mathrm{T}A + L)^{-1}$ in two steps: calculate

$$B^{-1} = Q(\Lambda + \alpha I)^{-1}Q^{-1}$$

and then perform a $\text{rank}(N)$ update of $B^{-1}$ using the Woodbury matrix identity. If all the diagonal elements of $L$ are equally common, $\text{rank}(N) = \text{rank}(A^\mathrm{T}A) - 1$. So, this method is likely only worth it if you have many different $L$ matrices, and the cost of diagonalizing $A^\mathrm{T}A$ can be amortized. Obviously, the larger $A^\mathrm{T}A$ is, the smaller the difference between $\text{rank}(N)$ and $\text{rank}(A^\mathrm{T}A)$ for you to see a net benefit, but it will require some thought as to whether it is worth it, overall.