When do Linear Transformations NOT preserve angles between vectors? Doesn’t the SVD tell us all linear transformations preserve angles

linear algebralinear-transformationsmatrix decompositionsingular valuessvd

From searching on the internet, I learned only a subset of linear transformations preserve angles between vectors. But –

Learning about the SVD – we can geometrically understand as breaking down some matrix A into a three matrices. These matrices can be understood geometrically as a rotation step, then a scaling step, and then another rotation.

Since any matrix can be broken down into these three steps (Since SVD applies to all matrices A?) doesn't that mean that all transformations are simply a rotation, a scaling, and then a rotation, which means the angles are preserved?

Why is this not true? And when do linear transformations preserve angles, and when do they not?

Thanks,
A

Best Answer

Stretching a circle into an ellipse doesn't preserve angles.

Related Solutions

[Math] Geometrical interpretations of SVD

A matrix represents a linear transformation that rotates, scales and shears whatever you put into it; so feeding the coordinates of a square could potentially give you, say, a parallelogram. An important fact is that there is a one-to-one correspondence between all real matrices and linear transformations: if you can think of a linear transformation, then there is a way to write it as a matrix.

SVD is based on a theorem that says any matrix $\mathbf A$ can be written in the form $\mathbf{U\Sigma V}^T$ where $\mathbf U$ and $\mathbf V$ are strictly rotations and $\mathbf \Sigma$ is a matrix that scales. So, any linear transformation can be broken down into 3 steps, i.e. rotate first, stretch/scale (not necessarily by the same amount in all directions; you could stretch the x-axis twice as much as the y-axis), and rotate again.

For instance, to transform a square into a paralellogram, you could rotate clockwise by $\theta$ (the value of this is not too important as long as you pick a sensible number as the rotation matrices are not unique), scale the axes by different factors, then rotate counter-clockwise again by $\theta$.

Points 1 and 2 are related in the following way: a projection (point 2) is a 'simplified' transformation. Suppose you had a transformation that changes a 1x1 square into a 10x0.1 rectangle. A projection would be to simply say that this transformation changes the square into a 10x0 'rectangle' (which is a line). This is dimensionality reduction: your 2-dimension square is projected onto a 1-dimensional line. If you did an SVD with this, $\mathbf U$ and $\mathbf V$ would be the identity matrices, and $\mathbf \Sigma$ would be a diagonal matrix (as it always is) with entries 10 and 0.1.

The key point to understanding the dimensionality reduction part is to completely forget about rotations: by the SVD decomposition theorem, rotations are irrelevant and can be 'added' in later or earlier; you only want to know the way in things scale (along different axes), so the SVD helps you strip away the rotation: a matrix that turns a square into a parallelogram can be seen as something that scales a square into a rectangle (between two rotations). Having something scale to a small value (relative to everything else) means that you can pretend it scales to zero, which in the context of transformations, is a projection and approximates the original transformation.

To summarise the answer to your question: when your transformation is just scaling, and one of the scales is relatively small, you can replace the smallest scale factor with zero, and this gives you a projection. SVD tells you that all transformations can be expressed as a scaling between two rotations, and the idea of dimensionality reduction is to replace the scaling with a projection. 'Selecting the right axes' refers to the rotation: you want to 'project away' only once you are sure that you lose as little as possible by first rotating your shape (or data).

[Math] interpretation of SVD for text mining topic analysis

The topic of low rank approximation is sprinkled throughout Math SE:

Low-rank Approximation with SVD on a Kernel Matrix

Matrix values increasing after SVD, singular value decomposition

The singular value spectrum may span several orders of magnitude. It seems natural that the contributions from the larger values are more important. Numerically, it is difficult to tell whether small singular values are valid or simply machine noise in computing a $0$ singular value. This requires a threshhold to determine which singular values are discarded.

Let's look at the SVD in detail.

Singular Value Decomposition

Every matrix $$ \mathbf{A} \in \mathbb{C}^{m\times n}_{\rho} $$ has a singular value decomposition of the form $$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccc|cc} \sigma_{1} & 0 & \dots & & & \dots & 0 \\ 0 & \sigma_{2} \\ \vdots && \ddots \\ & & & \sigma_{\rho} \\\hline & & & & 0 & \\ \vdots &&&&&\ddots \\ 0 & & & & & & 0 \\ \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] \\ % & = % U \left[ \begin{array}{cccccccc} \color{blue}{u_{1}} & \dots & \color{blue}{u_{\rho}} & \color{red}{u_{\rho+1}} & \dots & \color{red}{u_{n}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{v_{1}^{*}} \\ \vdots \\ \color{blue}{v_{\rho}^{*}} \\ \color{red}{v_{\rho+1}^{*}} \\ \vdots \\ \color{red}{v_{n}^{*}} \end{array} \right] % \end{align} $$

The connection to the row and column spaces follows: $$ \begin{align} % R A \color{blue}{\mathcal{R} \left( \mathbf{A} \right)} &= \text{span} \left\{ \color{blue}{u_{1}}, \dots , \color{blue}{u_{\rho}} \right\} \\ % R A* \color{blue}{\mathcal{R} \left( \mathbf{A}^{*} \right)} &= \text{span} \left\{ \color{blue}{v_{1}}, \dots , \color{blue}{v_{\rho}} \right\} \\ % N A* \color{red}{\mathcal{N} \left( \mathbf{A}^{*} \right)} &= \text{span} \left\{ \color{red}{u_{\rho+1}}, \dots , \color{red}{u_{m}} \right\} \\ % N A \color{red}{\mathcal{N} \left( \mathbf{A} \right)} &= \text{span} \left\{ \color{red}{v_{\rho+1}}, \dots , \color{red}{v_{n}} \right\} \\ % \end{align} $$ You are using is $\mathbf{S} \, \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*}$. This ignores the null space contributions in red.

A rank $\rho = 3$ approximation would look like this; $$ \mathbf{S}_{3} \, \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} = \left[ \begin{array}{cccc|cc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \\ 0 & 0 & \sigma_{3} \\ \end{array} \right] % % V \left[ \begin{array}{c} \color{blue}{v_{1}^{*}} \\ \color{blue}{v_{2}^{*}} \\ \color{blue}{v_{3}^{*}} \\ \end{array} \right] % \in \mathbb{C}^{\rho \times n} $$

The following sequence shows the Koch snowflake fractals and their singular value spectra. As the object becomes more detailed, the spectrum becomes richer.

Best Answer

Related Solutions

[Math] Geometrical interpretations of SVD

[Math] interpretation of SVD for text mining topic analysis

Related Question