[Math] Change of basis for image compression (toy example)

change-of-basiscompressionlinear algebralinear-transformations

In linear algebra course, prof. Strang uses JPEG image compression as an example of basis change (lecture 31). I would like to make sure that I understood the concept.

For simplicity, let's assume that we deal with 1-dimensional 8-pixels image vector (greyscale). In the original (uncompressed) form we represent the image using standard basis (see below). So, in the original form, for the image of 8 pixels we will save 8 coefficients that are just greyscale values. In the compressed form, instead of standard basis we represent our image using some other basis (Fourier, DCT etc). Now, if we use all 8 coefficients in this new basis we can lossless reconstruct the original image. Critically, some of these coefficients have small values so we can drop them without compromising too much image quality in reconstruction. So, we can save only let's say 4 coefficients, achieving 50% compression rate.

Does this makes sense?

If someone can also explain how this logic is generalized to 2D I will appreciate. At the end, we probably do not want to make 1D column vector from 2D because we loose information about proximal pixels in X direction. For example, suppose I have 100×100 matrix. If I concatenate by 2D data into 1D, the values at position 1 and 2 as well as 1 and 101 will likely have similar values. Is the basis constructed in such way that it somehow takes this into account?

Thanks!

enter image description here

Best Answer

The idea is to think of a $(m \times n)$-grayscale image as an $(m\times n)$-matrix. In the standard basis for the vector space of $(m\times n)$-matrices (matrices with a $1$ in the $(i,j)$-position and $0$ everywhere else), the coefficient of each basis element contains the same amount of information, namely it gives us the value of a single pixel. The idea of image-compression algorithms, which is mind-boggling really, is to choose a basis with respect to which the coefficients for the first few basis vectors are much more important than the coefficients of subsequent basis elements. So the first few coefficients might give you 99% of the information you want about the image, and the remaining coefficients just 1% of what you care about.

A Fourier transform is an example- the eye isn't sensitive to pixel changes beyond a certain frequency, and keeping just the first few Fourier coefficients (what counts as "first few" depends on where you set your thresholds) is an essentially lossless compression. JPEG is much the same. Note that real-life JPEG isn't truly lossless- it leaves awful artifacts that are a pain in the neck for image processing, because the higher Fourier coefficients, while their effect is barely visible to the human eye, can throw-off image processing algorithms. See this link for an example of some such artifacts.

Rather than JPEG, an easier image-compression example to think about (for me at least, because I hate JPEG artifacts) is based on the Singular Value Decomposition (SVD) of a matrix. By considering pixel values as real numbers, you take the SVD of the image, and throw away coefficients of all eigenvectors with corresponding eigenvalues under some threshold. This is called Principal Component Analysis (PCA). You can thus encode your image using only a small collection of vectors (eigenvectors) and their corresponding eigenvalues, leading to substantial compression. And the compressed image will look much the same as the image that you started with.

For a nice explanation with examples, see these slides.

Related Question