[Math] Listing applications of the SVD

The SVD (singular value decomposition) is taught in many linear algebra courses. It's taken for granted that it's important. I have helped teach a linear algebra course before, and I feel like I need to provide a better motivation for the SVD. This has motivated me to ask what the applications of the SVD might be.

I have considered the following applications, but I am sceptical of some of the ones I list:

Low-rank approximation in image compression. Aren't images usually compressed using an algorithm like JPEG which is based on the discrete cosine transform (DCT) and not on low-rank approximation? Basically, is this something that's used in competitive image compression algorithms?
Low-rank approximation in recommender systems. The problem to solve is that you have a matrix $M$ where the columns are users and the rows are products. The entries of the matrix tell you what rating each user gave to each product. Notice that for many use-cases, some of the entries of the matrix will be missing. Picture those missing entries as taking on the special value "?" (that is, a question mark). I am almost certain that you can't use a default value like $0$. Supposedly, given a low-rank approximation $M \approx U\Sigma V^T$, the columns of $V$ represent products, and columns which have a high cosine-similarity represent similar products. "Cosine-similarity" is simply the cosine of the angle between between vectors. The problem is that you can't take the SVD of a matrix with missing values. This point is well known among practitioners who have developed approaches to low-rank approximation of matrices with missing entries that don't directly involve using the SVD (and I'm not sure how they even could use the SVD).
Finding the nearest orthogonal matrix to a given matrix in machine vision. The objective is, given a matrix $M$, find the orthogonal matrix $\Omega$ for which $\|M – \Omega\|$ is minimised. For some matrix norms, this problem can be solved optimally (in the sense of finding the optimal $\Omega$) using SVD. This is potentially useful in machine vision for turning a matrix which ought to be orthogonal, but which isn't because of various sources of error, into one that is orthogonal. This is helpful for reconstructing the position of a camera in space based on photographs taken with that camera.
Principal component analysis (PCA). The objective is to reduce the dimensionality of a dataset in order to plot the data or use it in machine learning algorithms that might not work well with high-dimensional data. The idea here is that you imagine that your data is drawn from a multivariate normal distribution. What PCA does is infer the underlying multivariate normal distribution and approximate it using a lower-dimensional multivariate normal. Note that if a dimension gets scaled in the sample set, then the output of PCA can be changed. In many cases in which PCA is used (most cases at the moment?), the dataset is not actually drawn from a normal distribution; and I don't know how these deviations from normality affect the suitability of PCA.
Linear regression. My understanding is that SVD can be used to find Moore-Penrose pseudoinverses, and Moore-Penrose pseudoinverses can in turn be used to fit linear regression models. It's worth pointing out that linear regression is usually done alongside regularisation in order to prevent overfitting. One form of regularised linear regression is called ridge regression. There is indeed a way to fit a Ridge regression model using SVD. Some properties of Ridge regression can be studied this way.

I'd love to see applications in pure mathematics or physics as well. The above summarises my investigations into SVD thus far.

Best Answer

I teach a course on Applied Linear Algebra, intended for Engineers, where the final project is always to give a presentation on an application of linear algebra in the student's field of study. Since I myself am not an engineer, I often learn new things here, but there is the risk that my level of understanding is no better than an undergraduate engineer.

With this caveat, one of the common topics my students talk about is image identication. You have, say, $M$ reference photos of people's faces, each of them $K \times K$ grey-scale pixels and positioned similarly in the camera frame. You get a new photo and you want to identify which face it is most like.

The naive thing to due is to treat each of the $M$ images as a vector with $K^2$ entries, and find the reference photo in $K^2$ dimensional space which is closest to your photo. The trouble is that you will be thrown off by random noise introduced by the photography process, and you will get a face which is closest to your photo in the directions of random variation, and not in the directions that faces naturally vary.

A better idea is to use SVD on the $M \times K^2$ matrix whose columns are your reference photos, to find a lower dimensional plane $V$ in $K^2$-dimensional space, such that all of your reference photos are close to $V$. Then project all the reference photos, and the input photo, onto $V$, before finding the closest one. This way, the output photo will be close to the input photo in the ways in which faces vary, while the random noise will be projected away.

This is called the eigenfaces method. It can of course apply to images other than faces, such as optical character recognition.

Best Answer

Related Solutions

[Math] Is this formulation of the Singular Value Decomposition standard

[Math] How to project a vector onto a very large, non-orthogonal subspace

Related Question