[Math] Doubts on Reproducing Kernel Hilbert Spaces and orthogonal decomposition

fa.functional-analysishilbert-spaceskernels

I'm a CS student and I'm trying to learn RKHS theory to understand the passages made in this paper .
Among the bibliography I'm using there are "On the mathematical fundamentals of learning" and "Learning with Kernels".
I think I've got a good grasp of the relevant theory, but there's still something that bugs me.

As far as I understood a RKHS is a Hilbert space $H\subseteq \mathbb{C}^X$, where $X$ is a generic set of objects, with inner product $\langle f,g\rangle=\int_X\int_X \alpha(x')\beta(x)k(x,x')dxdx'$ with $\alpha,\beta \in \mathbb{C}^X$, $f=\int_X\alpha(x')k(x,x')dx'$ and $g=\int_X\beta(x')k(x,x')dx'$ such that $H=\overline{\mathrm{span}\{k_x|x\in X\}}$ and $\langle f,k_x\rangle=f(x)$ where $k_x=k(.,x)\;\forall x\in X$ and $k$ is the (Mercer) kernel for $H$. Let's call this "continuos-whole" definition.

However, given $D=\{x_1,x_2,…x_n\}\subset X$ a subspace $H_D\subset H$ could be defined by restricting the definition above to $H_D=\overline{\mathrm{span}\{k_{x_i}|x_i\in D\}}=\{f\in \mathbb{C}^X|f=\overset{i=s}{\underset{i=1}{\sum}} \alpha_ik(x,x_i)\quad \alpha_i\in \mathbb{C} \;, r\in \mathbb{N}\}$ and thus define $\langle f,g\rangle_{H_D}=\overset{i=s}{\sum_{i=1}}\overset{j=s}{\sum_{j=1}}\alpha_i\beta_jk(x_i,x_j)$. Let this be the "discrete-finite" definition

Assuming this is correct, the references tend to define $H$ as both being spanned by $\{k_x|x \in X\}$ and having inner product $\langle .,.\rangle_{H_D}$, and that would be possible if and only if $\overline{\mathrm{span}\{k_x|x\in X\}}=\overline{\mathrm{span}\{k_{x_i}|x_i\in D\}}$ which seems bogus to me.

Then, regarding the article, there's a passage of which I'm unsure. Let $H$ be a RKHS, i.e. a Hilbert space s.t. every point-evaluation functional is bounded, which would entail a "whole-continuous" definition. Then $H$ can be orthogonal decomposed in $H=H_D\oplus H_D^\bot$ where $H_D^\bot \bot H_D \rightarrow \langle f,g\rangle=0 \forall f\in H_D \;\wedge\; g\in H_D^\bot$. The paper says that $H_D^\bot=\{g|g(x_i)=0\forall x_i\in D\}$. Starting from the definition of orthogonality, I carried out the following proof:
$\forall f \in H_D \;\wedge\; g \in H_D^\top$ $\langle f,g \rangle=\int_X\int_X \alpha(x')\beta(x)k(x,x')dxdx'=\int_X \beta(x)f(x)dx$ $=\sum_{i \in 1..s}\alpha_i\int_X \beta(x)k(x,x_i)dx=\sum_{i \in 1..s} \alpha_i g(x_i)=0 \forall \alpha_i \in \mathbb{C}\rightarrow g(x_i)=0 \forall i\in 1..s$, for which I applied the reproducing property and the definiton of $g$ in this order. is this correct?

EDIT:I know the definition via the Riesz theorem, however I'm referring to the definiton I found in "Learning with kernels" (pg. 36):

"Definition 2.9 (Reproducing Kernel Hilbert Space) Let $X$ be a nonempty set and by $H$ a Hilbert space of functions $f :X\rightarrow\mathbb{R}$ . Then $H$ is called
a reproducing kernel Hilbert space endowed with the dot product $\langle .,. \rangle$ (and the norm $||f|| : \sqrt \langle f ,f\rangle $ ) if there exists afunction $k :X\times X \rightarrow \mathbb{R}$ with the following properties:

  1. $k$ has the reproducing property
    $\langle f, k(x,. )\rangle= f(x)$ for all $f\in H$ ;

  2. $k$ spans $H$ , i.e. $H=\overline{\mathrm{span}\{k(x,.)| x \in X\}}$ where $\overline{X}$ denotes the completion of the set $X$"

, and, to my understanding, it is based on the theorems characterising RKHS on Mercer kernels (see pg. 35 of "On the mathematical foundations of Learning").
The approach they take, as far as I've understood, is inside-out: they start from a Lp space $H=\overline{\mathrm{span}\{k(x,.)| x \in X\}}$ with $k$ being a (Mercer) kernel, for which define a suitable inner product to get a Hilbert space that is also a RKHS. They defined a Mercer kernel to be a function $X^2\rightarrow \mathbb{R}$ that gives rise to a definite positive Gram matrix $K_{ij}=k(x_i,x_j)$ for every $D=\{x_1,x_2..x_n\}\subset X$.

Best Answer

To answer your second question:

$H_D = \operatorname{span}(k_x : x \in D) $ is a finite dimensional (and hence closed) subspace of $H$. A function $f \in H$ is orthogonal to each $k_x, x \in D$ precisely when $\langle f , k_x \rangle = f(x) = 0$. In other words, $H_D^\perp = ( f \in H : f(x_i) = 0, x_i \in D)$ and of course $H = H_D \oplus H_D^\perp$.

Related Question