Let $H=H_0^1\cap H^2$. A good norm to work with is $$\|u\|_H=\|\Delta u\|_2$$
As you can see here, this norm is equivalently to the usual one. Also, the inequality you are looking for is true, in fact, you have that $$\int |\Delta u|^2\geq\lambda_1^2\int|u|^2,\ \forall\ u\in H$$
where $\lambda_1$ is the firt eigenvalue associated with $(-\Delta, H_0^1)$. To prove this inequality, note that
\begin{eqnarray}
\int |\nabla u|^2 &=& -\int u\Delta u \nonumber \\
&\leq& \|u\|_2\|\Delta u\|_2 \nonumber
\end{eqnarray}
From the last inequality, you can conclude by using Poincare inequality. Now you can easily apply Lax-Milgram.
If we say a solution is weak/strong/classical/viscous, the following aspects are concerned (or more):
How we obtain the solution.
The regularity of the solution (how smooth this solution is, integrability, differentiability).
The solution satisfies the equation in what sense.
Weak solution:
- We can obtain the solution by Ritz-Galerkin formulation: find the minimizer of the following quadratic functional in an appropriate Hilbert space,
$$
\mathcal{F}(u) = \frac{1}{2}\int_{\Omega} |\nabla u|^2 - \int_{\Omega} fu.
$$
Smoothness depends on the right side data. If the $f\in H^{-1}$, then $u\in H^1$. If $f\in L^2$, then $u\in H^2_{loc}$. Moreover if $\Omega$ is $C^{1,1}$, we have an $H^2$-solution $u$ globally.
The solution satisfies the equation in distribution sense (see following explanation).
Why "weak":
The term "weak" normally refers to the 2 and 3: The solution $u$ is only in $H^1$ in the most general setting, this means that $u$ is the only differentiable once, notice $-\Delta$ has second partial derivative in it. The strong solution, however, indeed have twice differentiability, normally if we say $u$ is a strong solution, we mean that $u$ has $W^{2,p}$-regularity (Please refer to Gilbarg and Trudinger). The solution satisfies the equation only in the "weak" formulation
$$
\int_{\Omega} \nabla u \cdot \nabla v \, dx =
\int_{\Omega} fv \, dx
\quad \forall v \in V, \tag{1}
$$
where $V$ is certain Sobolev space.
Two ways to get this weak form: first is to write what condition the minimizer of $\mathcal{F}(u)$ must satisfy: if $u$ is a minimizer, then
$$
\lim_{\epsilon \to 0}\frac{d}{d\epsilon} \mathcal{F}(u+\epsilon v) =0
$$
and the weak form of Euler-Lagrange equation is (1).
Another is multiplying the original equation by a test function then integration by parts. The intuition behind this should be Riesz representation theorem (at least to me it makes sense), we have:
$$
\langle (-\Delta)u,v\rangle = l_u(v) = (u,v)_{V},
$$
from the differential operator $-\Delta$ $\to$ linear functional $l_u$ $\to$ representation using inner product $(\cdot ,\cdot)_V$. The inner product $(\cdot ,\cdot)_V$ on this Hilbert space $V$ is the left hand side of (1), if we make the test function space have zero boundary condition (We can use Poincaré inequality to prove the equivalence of the standard $H^1$-inner product). If you have taken any numerical PDE course in finite element, the professor would introduce Lax-Milgram theorem, and Lax-Milgram relies on Riesz.
Why weak form is useful in finite element method:
Short answer: Weak form is very handy in that it helps us formulate a linear equation system which can be solved by computer!
Long answer: The essential of Galerkin type approach is that we are exploiting the fact that the infinite dimensional Hilbert space has a set of basis $\{\phi_i\}_{i=1}^{\infty}$, if we can expand the $u$ in this basis:
$$
u = \sum_{i=1}^{\infty} u_i\phi_i,
$$
where $u_n$ is a number, pluggin back to (1), and let the test function $v$ run through all $\phi_j$ (same function, different subscript):
$$
\int_{\Omega} \nabla (\sum_{i=1}^{\infty} u_i\phi_i) \cdot \nabla \phi_j \, dx =\sum_{i=1}^{\infty} u_i \int_{\Omega} \nabla \phi_i \cdot \nabla \phi_j \, dx =
\int_{\Omega} f\phi_j \, dx
\quad \forall j =1,2,\ldots. \tag{2}
$$
We have an infinite dimensional linear equation system:
$$
AU = F,
$$
where $A_{ji} = \displaystyle\int_{\Omega} \nabla \phi_i \cdot \nabla \phi_j \, dx$, $U_i = u_i$, and $F_j = \displaystyle\int_{\Omega} f\phi_j\, dx$.
Finite element method essentially choose a finite dimensional subspace $V_h\subset V$ (may not be a subspace, please google Discontinuous Galerkin method), so that we approximate the solution in this finite dimensional subspace $V_h$! The summation in (2) does not have an infinite upper limit any more, instead there are finitely many $\phi_i$ and $v$ runs from $\phi_1$ to $\phi_N$, so that the linear system generated is still $AU = F$, but this time, it only has $N$ equations, and we can use computer to solve it.
Best Answer
I agree with motivations A and B you gave in the text. Even more disappointingly, I would say that one simply chooses a Hilbert space instead of another because it works. So one chooses the least possible regularity and the simplest way to encode boundary conditions.
In the case of the Dirichlet problem, for example: $$\tag{(D)} \begin{cases} -\Delta u = f & \Omega \\ u= 0 & \partial \Omega\end{cases},$$ a solution $u$, whatever it is, must be something that realizes $$b(u, v)=0,\quad \forall v \in \text{some test function space}, $$ where $$b(u, v)=\int\left(-\Delta u - f\right)v\, dx, $$ whenever this makes sense. Turns out that, if we require $u, v\in H^1_0(\Omega)$, then $b$ takes on a super-nice form, the Lax-Milgram's theorem kicks in, everything goes on smoothly and our lives are beautiful. What if we had taken $H^2_0$ instead? Well, in this case we would have had trouble because, even in the simplest case $f=0$, the quadratic form $$b(u,u)=\int \lvert \nabla u \rvert^2\, dx $$ is not coercive, because it cannot control second derivatives. So the Lax-Milgram's theorem doesn't apply and our lives are miserable.
(A last remark which may possibly contradict everything above. As far as I know, there is an abstract theory of linear operators and quadratic forms on Hilbert spaces which, among other things, proclaims that $H^1_0$ is the "right" domain for the quadratic form $b(u,u)$ when $L$ is the Laplacian. If you really are interested in this you could look for the keywords "form domain of self-adjoint operators" or "Friedrichs extension". I am sure that those things are treated in Reed & Simon's Methods of Modern Mathematical Physics and in Zeidler's Applied Functional Analysis.)
EDIT: This answer is related to the last remark. The book by Davies explains this abstract theory IMHO very clearly.
P.S.: The dichotomy "our lives are beautiful / our lives are miserable" is a citation of J.L. Vázquez.