SVD – Linking right and left singular vectors

linear algebramatrix decompositionsvdsymmetric matrices

I'm working through the textbook mathematics for machine learning and have hit a sticking point in SVD. If the matrix $A \in \mathbb{R}^{m \times n}$ then, I understand how the right and left singular vector $V$ and $U$ are derived from the diagonalization of $A^TA$ and $AA^T$. However, the book states

The last step is to link up all the parts we touched upon so far. We have
an orthonormal set of right-singular vectors in $V$ . To finish the construction of the SVD, we connect them with the orthonormal vectors $U$ . To
reach this goal, we use the fact the images of the $\mathbf v_i$ under $A$ have to be
orthogonal, too. We can show this by using the results from Section 3.4.
We require that the inner product between $A\mathbf v_i$ and $A\mathbf v_j$ must be 0 for
$i \ne j$ . For any two orthogonal eigenvectors $\mathbf v_i$ , $\mathbf v_j$ , $i \ne j$ , it holds that:
$$ (A\mathbf v_i)^T(A\mathbf v_j) = \mathbf v_i^T(A^T A)\mathbf v_j = \mathbf v_i^T(\lambda_j \mathbf v_j ) = \lambda_j \mathbf v_i \mathbf v_j = 0 .$$

My question is, how does the final equation go from $\mathbf v_i^T(A^T A)\mathbf v_j = \mathbf v_i^T(\lambda_j \mathbf v_j )$? How does $A^T A$ end up as a scalar value $\lambda$? Can somebody add a more intuitive explaination to the above?

Best Answer

Remember how the $v$s were defined --- they're the eigenvectors of the square matrix $A^T A$, so for each $k = 1, 2, \ldots, n$, there's a number $\lambda_k$ with $$ (A^T A) v_k = \lambda_k. $$ In particular, for $k = j$, we have $$ (A^T A) v_j = \lambda_j. $$

Just to be clear, you've said that they arise from "the diagonalization of $M = A^TA$," and it may not be clear that this makes them eigenvectors. Well, suppose that $$ Q^{-1} M Q = D(\lambda_1, \ldots, \lambda_n) $$ a diagonal matrix with numbers we'll call "$\lambda$s" on the diagonal. Then multiplying through by $Q$ we get $$ MQ = QD $$ If we call the first column of $Q$ by the name $v_1$, then this can be read as saying, by equating the first column of each side, that $$ Mv_1 = Q \pmatrix{\lambda_1\\0\\ \vdots \\ 0} = \lambda_1 v_1. $$ and similarly for other columns. In other words, when we diagonalized $M$, the diagonalizing matrix $Q$ has the property that its columns (which we call $v_1, v_2, ...$) are each eigenvectors, with eigenvalues being the corresponding entries of the diagonal matrix $D$.