For the full rank least squares problem, where $A \in \mathbb{K}^{m \times n},m>n=\mathrm{rank}(A)$ ($\mathbb{K}$ is the base field), the solution is $(A^T A)^{-1} A^T b$. This is a very bad way to approach the problem numerically for condition number reasons: you roughly square the condition number, so a relatively tractable problem with $\kappa=10^8$ becomes a hopelessly intractable problem with $\kappa=10^{16}$ (where we think about tractability in double precision floating point). The condition number also enters into convergence rates for certain iterative methods, so such methods often perform poorly for the normal equations.
The SVD pseudoinverse is exactly the same as the normal equations pseudoinverse i.e. $(A^T A)^{-1} A^T$. You simply compute it using the SVD and simplify. There is indeed a simplification; the end result is
$$(A^T A)^{-1} A^T = V (\Sigma^T \Sigma)^{-1} \Sigma^T V^T.$$
This means that if I know the matrix of right singular vectors $V$, then I can transform the problem of finding the pseudoinverse of $A$ to the (trivial) problem of finding the pseudoinverse of $\Sigma$.
The above is for the full rank problem. For the rank deficient problem with $m>n>\mathrm{rank}(A)$, the LS solution is not unique; in particular, $A^T A$ is not invertible. The usual choice is to choose the solution of minimal Euclidean norm (I don't really know exactly why people do this, but you do need some criterion). It turns out that the SVD pseudoinverse gives you this minimal norm solution. Note that the SVD pseudoinverse still makes sense here, although it does not take the form I wrote above since $\Sigma^T \Sigma$ is no longer invertible either. But you still obtain it in basically the same way (invert the nonzero singular values, leave the zeros alone).
One nice thing about considering the rank-deficient problem is that even in the full rank case, if $A$ has some singular value "gap", one can forget about the singular values below this gap and obtain a good approximate solution to the full rank least squares problem. The SVD is the ideal method for elucidating this.
The homogeneous problem is sort of unrelated to least squares, it is really an eigenvector problem which should be understood using different methods entirely.
Finally a fourth comment, not directly related to your three questions: in reasonably small problems, there isn't much reason to do the SVD. You still should not use the normal equations, but the QR decomposition will do the job just as well and it will terminate in an amount of time that you can know in advance.
$A^\dagger b$ provides the vector $x$ that minimizes $\|Ax - b\|_2$ (in the case that $A^TA$ is invertible, this minimum is unique). Thus, the solution to your problem will be
$$
\|A(A^\dagger b) - b\|_2 = \|(AA^\dagger - I)b\|
$$
Best Answer
The column space of $A$ is the same as the span of the first $r$ columns of $U$; let $U_r$ be this $m \times r$ matrix. So the projection of $b$ onto the column space of $A$ is $\hat{b} := U_r (U_r^\top U_r)^{-1} U_r^\top b = U_r U_r^\top b$.
If $x$ is a solution to the optimization problem, then $Ax = \hat{b}$.
Thus, we can consider a second optimization problem: minimize $\|x\|_2$ subject to $Ax = \hat{b}$.
First, we check that $x := A^\dagger b$ is feasible, i.e. $A A^\dagger b = \hat{b}$. $$AA^\dagger b = U \Sigma V^\top V \Sigma^\dagger U^\top = U \begin{bmatrix} I_{r \times r} \\ & 0_{m-r \times m-r}\end{bmatrix} U^\top b = U_r U_r^\top b = \hat{b}.$$
Next we justify that it is minimum norm. Note that all other feasible $x$ can be written as $x = A^\dagger b + z$ for some $z$ in the nullspace of $A$. Note that the nullspace of $A$ is the same as the span of the last $n - r$ columns of $V$. On the other hand, $A^\dagger b = V\Sigma^\dagger U b$ lies in the span of the first $r$ columns of $V$. Thus $A^\dagger b$ and $z$ are orthogonal and we have $$\|x\|_2^2 = \|A^\dagger b\|_2^2 + \|z\|_2^2.$$ Thus choosing $z = 0$ minimizes $\|x\|_2$, so $A^\dagger b$ is the minimum norm solution.