[Math] “Rank-K Correction” of a matrix and significance

linear algebramatrix-rankoptimization

Today my studies led me to read about the matrix inversion lemma, which Wikipedia introduces as follows:

In mathematics (specifically linear algebra), the Woodbury matrix identity, named after Max A. Woodbury[1][2] says that the inverse of a rank-k correction of some matrix can be computed by doing a rank-k correction to the inverse of the original matrix.

After some searching I couldn't find any explanation of the terminology rank-$k$ correction, but I did find a Wikipedia article about the BFGS algorithm that uses the same term:

Instead, the Hessian matrix is approximated using rank-one updates specified by gradient evaluations… the approximate Hessian at stage $k$ is updated by the addition of two matrices $$B_{k+1} = B_k + U_k + V_k$$ Both $U_k$ and $V_k$ are symmetric rank-one matrices… [and together] construct a rank-two update matrix…

As far as I can tell from these sources, a rank-$k$ correction of matrix $A$ consists of the addition of a matrix $B$ of rank $k$ to $A$.

However, it is unclear to me what exactly the reason is for describing matrix addition in this way. Can someone help me understand the implications and applications of this terminology at a basic level, as well as its importance to optimization problems?

Best Answer

Rank updates to a matrix are often performed when new data is acquired and needs to be incorporated into a given model.

For example, in the case of regression, in order to compute the coefficient estimates, $\beta$, of linear model, the Normal equations say that we can compute them as $\hat{\beta} = (X^{T}X)^{g}X^{T}y$. If we assume that $X \in R^{n \times p}$ has full column-rank, then $(X^{T}X)^{g} = (X^{T}X)^{-1}$. Now let's say we observe some new samples $x_{n+1},x_{n+2},x_{n+3}... \in R^{p}$, we can define $X^{*} = \begin{bmatrix} X \\ x_{n+1} \\ x_{n+2} \\ ... \end{bmatrix}$. Rather than explicitly computing $(X^{*T}X^{*})^{-1}$, which might be costly if $p$ is large, we can make use of the matrix inversion lemma to simply update $(X^{T}X)^{-1}$ (which we already have stored somewhere from fitting our original model) with our new samples.

Related Question