Solved – Understanding the reproducing property of RKHS

kernel trickmachine learningsvm

I am currently trying to learn about Reproducing Kernel Hilbert spaces (RKHS) and would like to gain some intuition about its reproducing property. The RKHS is defined with kernel $k(x,x')$ which maps from a space $\mathbb{R}^N\times\mathbb{R}^N\rightarrow\mathbb{R}$, for $x,x'\in\mathbb{R}^N$. If we fix one of the kernel's arguments at a certain $x$ we shorten this as $k(x,\cdot)=k_x(\cdot)$. Now Wikipedia lists the reproducing property as

$$f(x)=<f,k_x>$$

I would like to understand what this means intuitively. In a paper I am currently studying, the authors use the reproducing property to obtain the following formula:

$$M(x)=\int{M(x')k(x,x')}dx'$$

where they assume the operation $M(x)\in\mathbb{R}^Y$ lies in a RKHS. I'm not quite certain how they obtain this result. Here is my attempt at explanation: A look at another StackExchange answer mentions the 'integral form of the inner product', which the stackexchange answer denotes as

$$<f,g>=\int{f(x)g(x)dx}$$

which looks a lot like

$$<f,k_{x'}>=\int{f(x')k(x',\cdot)dx'}$$

Is all of this correct so far? So if I now move away from the limit of infinity in Equation 2 and would take a Monte Carlo integral with a set of samples $x_i, i=1,…,N$, this would mean:

$$\hat{M}(x)\approx\frac{1}{S}\sum_{i=1}^{N}{M(x_i)k(x,x_i)}'$$

Or, more intuitively: does this mean that if I know $M(x_i)$ at locations $x_i$, then I could derive $M(x)$ simply by evaluating the kernel $k(x,x_i)$ (at least in the limit of $N\rightarrow\infty$, where $\hat{M}(x)=M(x)$)?


Continuing question: In the paper, the authors then use a set of $S\in\mathbb{N}$ Monte Carlo samples to 'generate a finite Hilbert space' and approximate the gradient of $M(x)$ in $\mathbb{R}^N$:

$$\nabla_xM(x)\approx\frac{1}{N}\sum_{i=1}^{S}M(x_i)\nabla_xk(x,x_i)$$

How do they arrive at this result? What I believe might have been done: they replaced $M(x)$ with $\nabla_xM(x)$ and then moved the Nabla operator to the kernel. Independently, they approximated the integral with Monte Carlo samples. Is this what happened here? If so, why may one freely move the Nabla operator around? If not, what is going on here instead?

Best Answer

You're overthinking this: the result with $M$ is just a restatement of the reproducing property. Using wikipedia's notation, we have that the reproducing property is defined as having an element $k_x \in \mathcal H$ for each $x\in\mathcal X$, the domain, such that $$ \langle f, k_x\rangle = f(x) $$ for each $f\in\mathcal H$.

Writing out the inner product as an integral w.r.t. a measure $\mu$ (I'm using Lebesgue integrals so this applies to both sums and the usual integrals) we have $$ f(x) = \langle f, k_x\rangle = \int_{\mathcal X} f(x')k_x(x')\,\text d\mu(x') \\ = \int_{\mathcal X} f(x')k(x,x')\,\text d\mu(x') $$ so plugging in $M=f$ exactly gives what they state.

Related Question