(I assume for the purposes of this answer that the data has been preprocessed to have zero mean.)
Simply put, the PCA viewpoint requires that one compute the eigenvalues and eigenvectors of the covariance matrix, which is the product $\frac{1}{n-1}\mathbf X\mathbf X^\top$, where $\mathbf X$ is the data matrix. Since the covariance matrix is symmetric, the matrix is diagonalizable, and the eigenvectors can be normalized such that they are orthonormal:
$\frac{1}{n-1}\mathbf X\mathbf X^\top=\frac{1}{n-1}\mathbf W\mathbf D\mathbf W^\top$
On the other hand, applying SVD to the data matrix $\mathbf X$ as follows:
$\mathbf X=\mathbf U\mathbf \Sigma\mathbf V^\top$
and attempting to construct the covariance matrix from this decomposition gives
$$
\frac{1}{n-1}\mathbf X\mathbf X^\top
=\frac{1}{n-1}(\mathbf U\mathbf \Sigma\mathbf V^\top)(\mathbf U\mathbf \Sigma\mathbf V^\top)^\top
= \frac{1}{n-1}(\mathbf U\mathbf \Sigma\mathbf V^\top)(\mathbf V\mathbf \Sigma\mathbf U^\top)
$$
and since $\mathbf V$ is an orthogonal matrix ($\mathbf V^\top \mathbf V=\mathbf I$),
$\frac{1}{n-1}\mathbf X\mathbf X^\top=\frac{1}{n-1}\mathbf U\mathbf \Sigma^2 \mathbf U^\top$
and the correspondence is easily seen (the square roots of the eigenvalues of $\mathbf X\mathbf X^\top$ are the singular values of $\mathbf X$, etc.)
In fact, using the SVD to perform PCA makes much better sense numerically than forming the covariance matrix to begin with, since the formation of $\mathbf X\mathbf X^\top$ can cause loss of precision. This is detailed in books on numerical linear algebra, but I'll leave you with an example of a matrix that can be stable SVD'd, but forming $\mathbf X\mathbf X^\top$ can be disastrous, the Läuchli matrix:
$\begin{pmatrix}1&1&1\\ \epsilon&0&0\\0&\epsilon&0\\0&0&\epsilon\end{pmatrix}^\top,$
where $\epsilon$ is a tiny number.
Suppose we have a bunch of large vectors $x_1,\ldots,x_N$ stored as the columns of a matrix $X$. It would be nice if we could somehow find a small number of vectors $u_1,\ldots,u_s$ such that each vector $x_i$ is (to a good approximation) equal to a linear combination of the vectors $u_1,\ldots, u_s$. This would allow us to describe each of the (very large) vectors $x_i$ using just a small number of coefficients.
So we want to find vectors $u_1,\ldots, u_s$ such that for each $x_i$ we have
\begin{equation}
x_i \approx c_{i,1} u_1 + c_{i,2} u_2 + \cdots + c_{i,s} u_s
\end{equation}
for some coefficients $c_{i,1},\ldots, c_{i,s}$.
These $N$ equations ($i$ goes from $1$ to $N$) can be combined into one single matrix equation:
\begin{equation}
X \approx U C
\end{equation}
for some matrix $C$. (Here the columns of $U$ are $u_1,\ldots, u_s$.)
Note that the rank of $UC$ is less than or equal to $s$. So $UC$ is a low rank approximation of $X$.
Here is the key fact: the SVD gives us an optimal low rank approximation of $X$ ! That is one of the basic facts about the SVD. That's why the SVD can be used for image compression.
If the SVD of $X$ is expressed as
\begin{equation}
X = \sum_{i=1}^N \sigma_i u_i v_i^T,
\end{equation}
where $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_N$,
then an optimal approximation of $X$ of rank less than or equal to $s$ is
\begin{align}
X &\approx \sum_{i=1}^s \sigma_i u_i v_i^T \\
&= U \Sigma V^T \\
&= U C
\end{align}
where $U$ is the matrix with columns $u_1,\ldots, u_s$ and $C = \Sigma V^T$.
Thus, the SVD finds an optimal $U$ for us.
PCA takes as input vectors $x_1,\ldots,x_N$ as well as a small positive integer $s$. PCA demeans the vectors and stores them in the columns of a matrix $X$, then simply computes the SVD $X = U \Sigma V^T$ and returns the first $s$ columns of $U$ as output.
Best Answer
When computing the PCA of some matrix, the eigenvectors are orthogonal because we symmetrize the matrix during the process. In general, eigenvectors of a matrix are not necessarily orthogonal; however, this is a property that holds for symmetric matrices.
The singular value decomposition of a matrix $A$ can be written as
$$A = U \Sigma V^T$$
where $U$ and $V$ are orthonormal. In practice, we don't often compute this because of numerical issues.
Instead, we might look at $$A^TA = V\Sigma^T U^T U \Sigma V^T = V\Sigma^2 V^T.$$
Equivalently, if we look at $A^TA$ as a symmetric, diagonalizable matrix, we can compute its eigendecomposition as $A^TA = W\Lambda W^T$ and we know that the eigenvectors found in the columns of $W$ are orthogonal due to symmetry.
The link between SVD and PCA is recognizing that these are the same things.