[Math] SVD [Singular Value Decomposition] on Transformation Matrix

svd

svd(T) = u sigma v

Here I understand meaning of each and every term and why SVD is important.

But I am failing to interpret this equation from Linear Algebra glasses.

When I have learnt about Linear Algebra, there was one thing common in all sources that is to view matrix is as Basis Vectors (or transformation matrix)

T v = λ u

T = transformation matrix 
v = some vector to be transformed after applying T

u = transformed unit vector 
λ = scale of transformation

But I am failing to relate this when it comes to SVD.

T = u sigma v

(Note: I am not looking for a answer what svd means and what each term means.
I am looking for an answer precisely on following confusion)

To be more precise:

Represent our data as Transformation Matrix:

Our data = m*n matrix = T

Apply transformation matrix T (that is our data) on some vector:

T * some-vector = new-rotated-unit-vector * scaling-factor

we get the same effect as above, by calling 3 different transformations (rotation-scaling-rotation) denoted by:

T = U.sigma. V ....... svd

So it means, on any vector v, we can apply T (our data matrix) or 3 transformations (U.sigma.V) and we can have the same effect.

So far so good when we see above operations only from transformation perspective.

Now suddenly, we change whole perspective. It is no more a transformation perspective.

As per new perspective, sigma-matrix also have other meaning apart from scaling-matrix which suggests which axis has highest variation to project our data.

This is so confusing. We used our data as transformation matrix T and decomposed it on 3 different matrices.
This is okay no issues.

Now we are saying, on one of these decomposed matrix, we can project our data (our data which we used as transformation matrix) onto that.

I am unable to match these 2 perspective and that is my problem.

Best Answer

The essence of your question, as far as I understand it, is this.

Suppose that $T = U\Sigma V^T$ is an SVD of an $m \times n$ data matrix. How does the geometric interpretation of the SVD connect with the statistical interpretation of $\Sigma$, wherein the singular values suggest which axis has the highest variation to project our data?

I will assume that each column of $T$ represents a single data point. To use the rows of $T$ in this way instead, simply transpose $T$ and apply the same analysis.

First of all, we should establish how it is that the transformation $T$ "encodes" the data in question. This relationship is simple: if $e_i$ denotes the $i$th standard basis vector of $\Bbb R^n$ ($i$th column of the size $n$ identity matrix), then the $i$th data-point is $Te_i$. So, $T$ is a transformation that takes the points $e_1,e_2,\dots,e_n$, each of which lies on the unit sphere in $\Bbb R^n$, and maps them to the data points $Te_1,Te_2,\dots,Te_n$.

The SVD shows what $T$ does to the unit sphere $S^{n-1}$, and the vector $v_1$ of $V$ corresponding to the largest singular value $\sigma_1$ shows the direction in which the sphere is "stretched" to the greatest extent within the ellipsoid $T(S^{n-1})$. Because the points $T e_1,\dots, T e_n$ are the output corresponding to points on the sphere, we see that they "move along" with the sphere, and so the "spherical" cloud of points is stretched to produce the "ellipsoidal" could of points corresponding to our data. The fact that $v_1$ is the direction in which the sphere is stretched the most corresponds to the fact that $v_1$ is the direction along which we find the most variation in the data cloud.