If $A = U \Sigma V^T$ and $A$ is symmetric, then $V$ is almost $U$ except for the signs of columns of $V$ and $U$.
$$A = W \Lambda W^T = \displaystyle \sum_{i=1}^n w_i \lambda_i w_i^T = \sum_{i=1}^n w_i \left| \lambda_i \right| \text{sign}(\lambda_i) w_i^T$$ where $w_i$ are the columns of the matrix $W$.
The left singular vectors $u_i$ are $w_i$ and the right singular vectors $v_i$ are $\text{sign}(\lambda_i) w_i$. (You can of course put the sign term with the left singular vectors as well.)The singular values $\sigma_i$ are the magnitude of the eigen values $\lambda_i$.
Hence, $A = U \Sigma V^T = W \Lambda W^T$
and $$A^2 = U \Sigma^2 U^T = V \Sigma^2 V^T = W \Lambda^2 W^T$$
Note that the eigenvalues of $A^2$ are positive.
Suppose we have a bunch of large vectors $x_1,\ldots,x_N$ stored as the columns of a matrix $X$. It would be nice if we could somehow find a small number of vectors $u_1,\ldots,u_s$ such that each vector $x_i$ is (to a good approximation) equal to a linear combination of the vectors $u_1,\ldots, u_s$. This would allow us to describe each of the (very large) vectors $x_i$ using just a small number of coefficients.
So we want to find vectors $u_1,\ldots, u_s$ such that for each $x_i$ we have
\begin{equation}
x_i \approx c_{i,1} u_1 + c_{i,2} u_2 + \cdots + c_{i,s} u_s
\end{equation}
for some coefficients $c_{i,1},\ldots, c_{i,s}$.
These $N$ equations ($i$ goes from $1$ to $N$) can be combined into one single matrix equation:
\begin{equation}
X \approx U C
\end{equation}
for some matrix $C$. (Here the columns of $U$ are $u_1,\ldots, u_s$.)
Note that the rank of $UC$ is less than or equal to $s$. So $UC$ is a low rank approximation of $X$.
Here is the key fact: the SVD gives us an optimal low rank approximation of $X$ ! That is one of the basic facts about the SVD. That's why the SVD can be used for image compression.
If the SVD of $X$ is expressed as
\begin{equation}
X = \sum_{i=1}^N \sigma_i u_i v_i^T,
\end{equation}
where $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_N$,
then an optimal approximation of $X$ of rank less than or equal to $s$ is
\begin{align}
X &\approx \sum_{i=1}^s \sigma_i u_i v_i^T \\
&= U \Sigma V^T \\
&= U C
\end{align}
where $U$ is the matrix with columns $u_1,\ldots, u_s$ and $C = \Sigma V^T$.
Thus, the SVD finds an optimal $U$ for us.
PCA takes as input vectors $x_1,\ldots,x_N$ as well as a small positive integer $s$. PCA demeans the vectors and stores them in the columns of a matrix $X$, then simply computes the SVD $X = U \Sigma V^T$ and returns the first $s$ columns of $U$ as output.
Best Answer
I think the best way to describe the relationship between SVD of a matrix (I'll just use $A$) and diagonalizability, and what makes it possible for every matrix to have a SVD, is because it is more closely related to the eigendecomposition of $AA^*$ and $A^*A$ (these matrices are positive semidefinite and is therefore always unitarily diagonalizable) than to $A$ itself. Notice that for $A=UDV$ we have \begin{equation} AA^*=UDVV^*D^*U^*=UD^2U^*\end{equation} and similarly we have $A^*A=V^*D^2V$. So $D$ is actually the square root of the eigenvalues of $AA^*$ and $A^*A$. Furthermore $U$ consists of eigenvectors for $AA^*$ and similarly $V$ consists of eigenvectors for $A^*A$.