Algorithms – Updating SVD Decomposition After Adding One New Row

algorithmslinear algebramatrix decompositionnumericssvd

Suppose that I have a dense matrix $ \textbf{A}$ of $m \times n$ size, with SVD decomposition $$\mathbf{A}=\mathbf{USV}^\top.$$ In R I can calculate the SVD as follows: svd(A).

If a new $(m+1)$-th row is added to $\mathbf A$, can one compute the new SVD decomposition based on the old one (i.e. by using $\mathbf U$, $\mathbf S$, and $\mathbf V$), without recalculating SVD from scratch?

Best Answer

Yes, one can update an SVD decomposition after adding one new row to the existing matrix.

In general this "add one to" problem formulation is known as rank one updates. The MathOverflow link provided by @amoeba on "efficient rank-two updates of an eigenvalue decomposition" is a great first step if you want to start looking deeper into the matter; the first paper provides an explicit solution to your specific question. Just to clarify what rank-one and rank-two mean so you do not get confused, if your new $A^*$ is such that:

\begin{align} A^* = A - uv^T \end{align}

Where $u$ and $v$ are vectors then you refer to this as a rank-one update (or perturbation). The basic of this update are dictated by the Sherman-Morrison formula.. If the perturbation is more than one rank ie. \begin{align} A^* = A - UV^T \end{align}

the Woodbury formula comes into play. If you see these formulas you will notice that there are lot of inverse involved. You do not solve these directly. As you already solved a great deal of their subsystems already (ie. you have some decomposition already computed) you utilize these to get a faster and/or more stable estimates. (That's why people still research this field.) I have used the book "Computational Statistics" by J.E. Gentle a lot as a reference; I think Chapt. 5 Numerical Linear Algebra will set you up properly. (The uber-classic: "Matrix Algebra From a Statistician's Perspective" by Harville unluckily does not touch on rank updates at all.)

Looking to the statistics/application side of things, rank one updates are common in recommender systems because one may have thousands of customer entries and recomputing the SVD (or any given decomposition for that matter) each time a new user registers or a new product is added or removed is quite wasteful (if not unattainable). Usually recommender system matrices are sparse and this makes the algorithms even more efficient. An accessible first paper is the "Fast online SVD revisions for lightweight recommender systems" manuscript by M. Brand. Going to dense matrices I think that looking at the papers from Pattern Recognition and Imaging Processing can get you quite far on getting an actual algorithm to use. For example the papers:

  1. Incremental learning of bidirectional principal components for face recognition (2009) by Ren and Dai,
  2. On incremental and robust subspace learning (2003) by Li et al.
  3. Sequential Karhunen-Loeve basis extraction and its application to images (2000) by Levey and Lindenbaum.
  4. Incremental Learning for Robust Visual Tracking (2007) by Ross et al.

all seem to be tackling the same problem in their core; new features are coming in and we need to update our representation accordingly fast. Notice that these matrices are not symmetric or even square. Another work of M. Brand can also addresses this problem (see the paper "Fast low-rank modifications of the thin singular value decomposition (2006)" - this also mentioned in the MO link given in the beginning of the post.) There a lot of great papers on the subject but most tend to be heavily mathematical (eg. the Benaych-Georgesa and Nadakuditi paper on "The singular values and vectors of low rank perturbations of large rectangular random matrices (2012)") and I do not think they will help get a solution soon. I would suggest you keep your focus on Image Processing literature.

Unfortunately I have not come across any R implementations for rank-one updates routines. The answer on "Updatable SVD implementation in Python, C, or Fortran?" from the Computational Science SE gives a number of MATLAB and C++ implementations that you may want to consider. Usually R, Python, etc. implementation are wrappers around C, C++ or FORTRAN implementations.