To wrap up what Theo and Jonas already said: Two (complex or real) Hilbert spaces are isomorphic if they have orthonormal bases with the same cardinality, as Hilbert spaces. So, every statement that makes use of the Hilbert space structure only, and is true or false for one space, will be true or false for the other.
But a concrete Hilbert space may have more structure than just the Hilbert space structure. When you look at the statement "A reproducing kernel Hilbert space is a Hilbert space in which the evaluation functional..." then the "evaluation functional"-part presupposes that the Hilbert space under consideration has (real or complex valued) functions as elements. This is an additional property that some Hilbert spaces have and some have not.
The space $L^2[0, 1]$ for example consists of equivalence classes instead of functions, and the "evaluation functional" cannot be well defined because it depends on the representative of an equivalence class $[f]$. In fact, for any $x \in \mathbb{R}$ and for every real number $y$ including $\infty$ and $-\infty$, every equivalence class has an element $f$ such that $f(x) = y$. And I can also define an abstract complex Hilbert space by saying that it is the space spanned by an orthonormal basis $(e_n)_{n \in \mathbb{N}}$. Now one cannot make sense of the term "evaluation functional", because the elements of this Hilbert space are not functions.
On the other hand, the Hardy space of the unit disk (Wikipedia) consists of holomorphic functions, therefore the evaluation functional is well defined. It is possible to prove that it is also continuous, but the proof makes use of the fact that the elements of this Hilbert space are holomorphic functions on the unit disk, which, as I said before, is additional structure that happens to exist for the Hardy space.
All of these examples are isomorphic as separable complex Hilbert spaces, but this isomorphism does not say anything about any structure that may exist beyond the Hilbert space structure.
Beginning with the answer to your second problem, suppose that $f \in H$, where $H$ is the reproducing kernel hilbert space. Let $S$ be the subspace spanned by the kernel functions $k(x_i, \cdot)$. Then by the theory of Hilbert spaces, $f$ can be written as $f = f_S + f_P$ where $$f_S(x) = \sum_{i=1}^n a_i k(x_i, x),$$ and $f_P$ is a function perpendicular to $S$. Moreover by the Pythagorean theorem $$\| f \|^2 = \| f_S \|^2 + \| f_P \|^2.$$ In particular this tells us that $\|f\| > \|f_S\|$ if $f_P \neq 0$.
Now consider $f(x_i)$, which can be written as $$f(x_i)=\langle f, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + \langle f_P, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + 0 = \langle f_S, k(x_i, \cdot) \rangle = f_S(x_i)$$
Thus for every $f$ we have $$\sum_{i=1}^n L(y_i, f(x_i) + b) = \sum_{i=1}^n L(y_i, f_S(x_i) + b)$$
Hence, $$F[f] = \lambda \| f\|^2 + \sum_{i=1}^n L(y_i, f(x_i) + b) > \lambda \| f_S\|^2 + \sum_{i=1}^n L(y_i, f_S(x_i) + b) = F(f_S)$$
and this holds for all $f \in H$. This means if a function is going to minimize $F$, it must be in the subspace $S$ and is a linear combination of kernel functions.
As for the first question, quadratic terms resembling $w^T A w$ appear through what is known as the Graham matrix, which is made from kernels: $$K = \left( k(x_i,x_j) \right)_{i,j=1}^n.$$ It is straightforward to prove that this matrix is symmetric and positive (semi)-definite, since if $a = (a_1, a_2, ..., a_n)$ then $$a^T K a = \left\langle \sum_{i=1}^n a_i k(x_i, \cdot), \sum_{j=1}^n a_j k(x_j, \cdot)\right\rangle=\left\|\sum_{i=1}^n a_i k(x_i, \cdot)\right\|^2$$
This gives us our first hint at how to recast $w^T A w$ into our language of reproducing kernel hilbert spaces.
Take for instance $$A = diag(a_1,a_2,a_3,..., a_n)$$ where each $a_i > 0$. Then $$w^T A w = \sum_{i=1}^n a_i w_i^2$$
Now imagine replacing $w$ with $f$, and each $w_i = f(x_i)$. Then $$\sum_{i=1}^n a_i w_i^2 = \sum_{i=1}^n a_i f(x_i)^2$$
By the same reasoning above, $$\sum_{i=1}^n a_i f(x_i)^2 = \sum_{i=1}^n a_i f_S(x_i)^2$$, and so we may add this to the loss function and still be guaranteed that a minimizer will be a linear combination of kernel functions.
So in short, you may introduce the term you want into your loss function. Here keeping in mind that $w = (f(x_1), f(x_2),...,f(x_n))$.
Best Answer
If by "imposing the norm constraint, $|f|=1$, corresponds to an orthogonal projection onto the direction selected in reproducing kernel Hilbert space" you (or the paper) mean "projecting orthogonally from any point to a point with fixed norm is a closest-point projection", then I think this is true of all Hilbert spaces.
Let the orthogonal projection of the original vector be written as $v = a + b$. Assume $a$ is in the norm-constrained subspace, and $b$ is not. By way of contradiction, assume $\langle a, b \rangle \neq 0$, and so $|v|^2 = |a|^2 + |b|^2 + 2 \langle a, b \rangle $. If you draw a picture here, it should be easy to see in this case, you should be able to pick a new $a'$ close to $a$ (you can do that, since Hilbert spaces are complete) such that $v = a' + b'$, $a'$ in the subspace and $b'$ not, but so that $|b'|^2 < |a'|^2$, which means $a$ was not a closest-point projection. The only way you avoid this contradiction is by making $\langle a,b \rangle=0$, qed.