I agree with motivations A and B you gave in the text. Even more disappointingly, I would say that one simply chooses a Hilbert space instead of another because it works. So one chooses the least possible regularity and the simplest way to encode boundary conditions.
In the case of the Dirichlet problem, for example:
$$\tag{(D)} \begin{cases} -\Delta u = f & \Omega \\ u= 0 & \partial \Omega\end{cases},$$
a solution $u$, whatever it is, must be something that realizes
$$b(u, v)=0,\quad \forall v \in \text{some test function space}, $$
where
$$b(u, v)=\int\left(-\Delta u - f\right)v\, dx, $$
whenever this makes sense. Turns out that, if we require $u, v\in H^1_0(\Omega)$, then $b$ takes on a super-nice form, the Lax-Milgram's theorem kicks in, everything goes on smoothly and our lives are beautiful. What if we had taken $H^2_0$ instead? Well, in this case we would have had trouble because, even in the simplest case $f=0$, the quadratic form
$$b(u,u)=\int \lvert \nabla u \rvert^2\, dx $$
is not coercive, because it cannot control second derivatives. So the Lax-Milgram's theorem doesn't apply and our lives are miserable.
(A last remark which may possibly contradict everything above. As far as I know, there is an abstract theory of linear operators and quadratic forms on Hilbert spaces which, among other things, proclaims that $H^1_0$ is the "right" domain for the quadratic form $b(u,u)$ when $L$ is the Laplacian. If you really are interested in this you could look for the keywords "form domain of self-adjoint operators" or "Friedrichs extension". I am sure that those things are treated in Reed & Simon's Methods of Modern Mathematical Physics and in Zeidler's Applied Functional Analysis.)
EDIT: This answer is related to the last remark. The book by Davies explains this abstract theory IMHO very clearly.
P.S.: The dichotomy "our lives are beautiful / our lives are miserable" is a citation of J.L. Vázquez.
Question 1: In numerical PDE, Lax-Milgram lemma's requirement does not have to be met. Coercivity basically means the problem you got is invertible. There is a weaker version of coercivity which is called Babuska–Brezzi inf-sup condition, it is for mixed formulation of FEM. Another one is called Fredholm alternative, it is when both the coercivity and inf-sup condition fail. Though I doubt what you need are these right now. In your case, you wanna deliver a well-posed FEM problem for Robin boundary condition, the Lax-Milgram lemma has to be satisfied.
Question 2:
(1) For Dirichlet boundary value problem. In the continuous space, say if $-\Delta u= f$ in $\Omega$, $u|_{\partial \Omega}=g$. We could write $u= u_0 + u_g$, where $-\Delta u_0= f$ in $\Omega$, $u_0|_{\partial \Omega}=0$, and $-\Delta u_g= 0$ in $\Omega$, $u_g|_{\partial \Omega}=g$. In the variational problem, the test function is chosen to be 0 on the boundary, if we solve
$$
(\nabla u,\nabla v) = (f,v)
$$
for any $v\in H^1_0$, what we get is $u_0$. We don't know what the boundary is like solving the variational problem because the test function $v$ is always 0 there. However, we know what the exact $u$'value is on boundary, so we are good here.
In finite elements, we force the value of the FEM solution $u_h\in V_h$ on the boundary nodes to be equal to the exact value of the boundary data. The test function $v$ is chosen to be zero on the boundary, i.e.,$v\in V_h\cap H^1_0$ for Poisson equation. Lax-Milgram lemma's conditions are satisfied for both $H^1_0$ and $V_h \cap H^1_0$. The problem is well-posed.
To answer your question, we don't put any constraints in the interior nodes (or say Degrees of Freedom associated to interior nodes), we just solve the variational problem setting test function $v=0$ on the boundary. What we get is the finite elements approximation $u_{0,h}$ to $u_0$. Then we let $u_{g,h}(V) = g(V)$ for any boundary node $V$, and $u_{g,h}(V')=0$ for any interior nodes $V'$, this is the discrete version $u_g$. The equations they satisfy are:
$$
(\nabla u_{0,h},\nabla v) = (f,v)\quad \forall v\in V_h\cap H^1_0, \text{ and } u_{0,h}|_{\partial \Omega} = 0
$$
$$
(\nabla u_{g,h},\nabla v) = 0\quad \forall v\in V_h\cap H^1_0, \text{ and } u_{g,h}|_{\partial \Omega} = \Pi_h g
$$
where $\Pi_h g$ is the interpolation of $g$ on boundary which is defined by imposing the nodal value like we just did. And your final finite elements solution is $u_h = u_{0,h} + u_{g,h}$.
(2) For Pure Neumann boundary problem, the solution is unique up to be a constant, we have to consider the problem in a slightly different space: $\{v\in H^1: \displaystyle \int_{\Omega} v = 0\} = H^1/\mathbb{R}$. Thanks to Poincare inequality, Lax-Milgram lemma's conditions are again met. For FEM problem, we don't force any boundary condition for the test function, rather we just solve the variational equation (When solving the linear system, there are various ways to deal with the modulus of constant part). In this sense, we say that the Neumann boundary condition is satisfied "weakly".
Last Question:
Now for Robin boundary condition, say your equation is:
$$
-\Delta u = f \quad \text{in } \Omega
$$
with Robin boundary condition for all the boundary $\partial \Omega$:
$$
\alpha u +\frac{\partial u}{\partial n} = g
$$
where $\alpha$ is a positive constant ($\alpha$ has to be positive a.e. on the boundary), and $\partial u/\partial n = \nabla u\cdot n$ is the normal derivative on boundary. Multiply the equation by a test function $v\in H^1$,
$$
-\int_{\Omega} \Delta u\,v = \int_{\Omega} fv
$$
and perform integration by parts:
$$\int_{\Omega} \nabla u\cdot \nabla v - \int_{\partial \Omega}(\nabla u\cdot n )v= \int_{\Omega} fv
$$
plug in the Robin boundary data:
$$\int_{\Omega} \nabla u\cdot \nabla v - \int_{\partial \Omega}(g-\alpha u)v= \int_{\Omega} fv
$$
Rearrange:
$$\int_{\Omega} \nabla u\cdot \nabla v + \int_{\partial \Omega}\alpha u v= \int_{\Omega} fv + \int_{\partial \Omega}gv
$$
The left hand side $a(u,v) = \displaystyle \int_{\Omega} \nabla u\cdot \nabla v + \int_{\partial \Omega}\alpha u v$, right hand side is the $L(v)$. The proof of continuities of the right hand side and the first term in $a(u,v)$ is natural, just following the proof for Dirichlet and Neumann boundary problems. For second term on the left we would like to use the trace inequality:
$$
\int_{\partial \Omega}\alpha u v \leq \alpha \|u\|_{L^2(\partial \Omega)} \|v\|_{L^2(\partial \Omega)} \leq C \alpha \|u\|_{H^1(\Omega)} \|v\|_{H^1(\Omega)}
$$
thus $a(u,v) \leq C \|u\|_{H^1(\Omega)} \|v\|_{H^1(\Omega)}$.
The coercivity of $a(u,v)$ is a little bit more difficult to prove. The way I learned in FEM class is by what is called "compactness argument". The trick is using proof by contradiction: Suppose $a(u,v)$ is not coercive, then (here comes the compactness argument) there exists a sequence $\{v_n\}\subset H^1$, such that:
$$
\|v_n\|_{H^1(\Omega)} = 1, \text{ and } a(v_n,v_n) \to 0.\tag{1}
$$
Now since $\{v_n\}$ is a bounded sequence in $H^1$ (consider the distance being induced by the $H^1$-norm), then there exists a weakly convergent subsequence $v_{n_j}\to v$ in $H^1$ and $v_{n_j}\to v$ in $L^2$-norm (thanks user33869 for pointing out my previous error).
Weak convergence is:
$$
\int_{\Omega}(v-v_{n_j})w + \int_{\Omega}\nabla(v-v_{n_j})\cdot \nabla w \to 0, \; \text{ for all } w\in H^1(\Omega).
$$
This implies that:
$$
\int_{\Omega} |\nabla v|^2 = \lim_{j\to \infty}\int_{\Omega} \nabla v_{n_j}\cdot \nabla v
\leq \lim_{j\to \infty}\left( \int_{\Omega} |\nabla v_{n_j}|^2 \right)^{1/2}
\left(\int_{\Omega} |\nabla v |^2\right)^{1/2}
$$
This is
$$
|v|_{H^1(\Omega)}
\leq \lim_{j\to \infty} |v_{n_j}|_{H^1(\Omega)}\tag{2}
$$
Using another assumption now
$$
a(v_n,v_n) = \int_{\Omega} |\nabla v_n|^2 + \int_{\partial \Omega}\alpha v_n^2 \to 0 \tag{3}
$$
This implies that $\displaystyle\int_{\Omega} |\nabla v_n|^2 \to 0$.
By (2), letting $n_j\to \infty$ yields $\displaystyle\int_{\Omega} |\nabla v|^2 = 0$. Now we have $ \|v\|_{H^1(\Omega)}=\|v\|_{L^2(\Omega)}$.
Now using $v_{n_j}\to v$ in $L^2$-norm, together with (3), $\|v\|_{L^2(\Omega)} = \lim\limits_{j\to \infty} \|v_{n_j}\|_{L^2(\Omega)}=1$
The $L^2$-norm of gradient of $v$ being zero implies $v$ is a constant, The $L^2$-norm of $v$ being 1 implies $v=c$ where the constant $c\neq 0$. However, again by (3):
$$
\int_{\partial \Omega}\alpha v^2 = \lim_{n\to \infty}\int_{\partial \Omega}\alpha v_{n_j}^2 = 0
$$
together with the positivity assumption on $\alpha$, this says $v=0$ on $\partial \Omega$. Constradition.
Therefore there does not exist such sequence satisfying (1). The negation of the statement (1) is: For all sequence in $H^1(\Omega)$, if $\|v_n \|_{H^1(\Omega)}$ is bounded, $a(v_n,v_n)$ is always bounded away from 0. This can be translated to any function $v$ in $H^1$, for we could always find a bounded sequence that goes to $v$ by the completeness of a Hilbert space. Thus the coercivity is proved.
Best Answer
The point is to find a good approximation to $u$. Céa's lemma tells you that $$ \|u-u_h\|\leq c\inf_{v\in V_h}\|u-v\|,\qquad\qquad(*) $$ that is, the error of $u_h$ is, up to a constant factor, as good as the error of the best approximation from $V_h$. So if $V_h$ has good approximation properties, then $u-u_h$ will be very small.
This also answers your second question, because if $u\in V_h$, then the right hand side of $(*)$ is $0$, and hence $u-u_h=0$.