First, consider the problem $\Sigma x = b$, where
$$
\Sigma = \pmatrix{\sigma_1\\& \ddots\\&&\sigma_r\\ &&&0\\&&&&\ddots\\&&&&&0}
$$
Note that $b$ is only in the range of $\Sigma$ if its entries $b_{r+1},\dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/\sigma_1,\dots,b_r/\sigma_r,0,\dots,0)^T = \Sigma^+ b
$$
From there, note that
$$
U\Sigma V^T x = b \implies\\
\Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = \Sigma^+ U^T b$. Noting that $\|V^T x\| = \|x\|$, we can use this to conclude that $x = (V \Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
For the full rank least squares problem, where $A \in \mathbb{K}^{m \times n},m>n=\mathrm{rank}(A)$ ($\mathbb{K}$ is the base field), the solution is $(A^T A)^{-1} A^T b$. This is a very bad way to approach the problem numerically for condition number reasons: you roughly square the condition number, so a relatively tractable problem with $\kappa=10^8$ becomes a hopelessly intractable problem with $\kappa=10^{16}$ (where we think about tractability in double precision floating point). The condition number also enters into convergence rates for certain iterative methods, so such methods often perform poorly for the normal equations.
The SVD pseudoinverse is exactly the same as the normal equations pseudoinverse i.e. $(A^T A)^{-1} A^T$. You simply compute it using the SVD and simplify. There is indeed a simplification; the end result is
$$(A^T A)^{-1} A^T = V (\Sigma^T \Sigma)^{-1} \Sigma^T V^T.$$
This means that if I know the matrix of right singular vectors $V$, then I can transform the problem of finding the pseudoinverse of $A$ to the (trivial) problem of finding the pseudoinverse of $\Sigma$.
The above is for the full rank problem. For the rank deficient problem with $m>n>\mathrm{rank}(A)$, the LS solution is not unique; in particular, $A^T A$ is not invertible. The usual choice is to choose the solution of minimal Euclidean norm (I don't really know exactly why people do this, but you do need some criterion). It turns out that the SVD pseudoinverse gives you this minimal norm solution. Note that the SVD pseudoinverse still makes sense here, although it does not take the form I wrote above since $\Sigma^T \Sigma$ is no longer invertible either. But you still obtain it in basically the same way (invert the nonzero singular values, leave the zeros alone).
One nice thing about considering the rank-deficient problem is that even in the full rank case, if $A$ has some singular value "gap", one can forget about the singular values below this gap and obtain a good approximate solution to the full rank least squares problem. The SVD is the ideal method for elucidating this.
The homogeneous problem is sort of unrelated to least squares, it is really an eigenvector problem which should be understood using different methods entirely.
Finally a fourth comment, not directly related to your three questions: in reasonably small problems, there isn't much reason to do the SVD. You still should not use the normal equations, but the QR decomposition will do the job just as well and it will terminate in an amount of time that you can know in advance.
Best Answer
Let $A$ be a matrix whose columns are the coordinates of the points being fitted, relative to the centroid (that is, every column of $A$ is a point being fitted minus the coordinates of the centroid). From the Eckhart-Young thoeorem, we find that if $A$ has singular value decomposition $A = U \Sigma V^T$ where $$ \Sigma = \pmatrix{\sigma_1\\&\sigma_2 \\ & & \sigma_3}, \qquad \sigma_1 \geq \sigma_2 \geq \sigma_3 $$ Then the coordinates of the projection of the columns on the best fit plane are the columns of the matrix $\tilde A = U \tilde \Sigma V^T$, where $$ \tilde \Sigma = \pmatrix{\sigma_1\\&\sigma_2 \\ & & 0} $$ Now, let $\|A\|$ denote the Frobenius norm, which is to say that $\|A\|^2 = \sum_{i,j}|a_{ij}|^2$. We find that the square of the minimized error is given by $$ \|A- \tilde A\|^2 = \|U(\Sigma - \tilde \Sigma)V^T\|^2 = \|\Sigma - \tilde \Sigma\|^2 = \sigma_3^2 $$