Solved – Gaussian process prior

gaussian processprior

I have an assignment and I'm kinda confused with the terminology of prior in the context of Gaussian processes.

Let $f$ be the function of interest. The matrix $\textbf{X}$ is also a set of $N$ $M$-dimensional input vectors.

According to the description I have been provided, we can formulate the prior over the output of the function f using a Gaussian Process in the following way:

$p(f|\textbf{X}, \textbf{θ}) = Ν(\textbf{0}, k(\textbf{X}, \textbf{X}))$

where $θ$ is the hyperparameters of the kernel $k$.

What troubles me is that this has been called a prior but it depends on the data $\textbf{X}$.

Shouldn't the prior be independent on the data, expressing our belief about some property of the function through the kernel?

Is the notation/terminology used wrong or is there something I haven't understood in the notation?

Best Answer

A Gaussian Processes is considered a prior distribution on some unknown function $\mu(x)$ (in the context of regression). This is because you're assigning the GP a priori without exact knowledge as to the truth of $\mu(x)$. Learning a GP, and thus hyperparameters $\mathbf\theta$, is conditional on $\mathbf{X}$ in $k(\mathbf{x},\mathbf{x'})$.

It is worth noting that prior knowledge may drive the selection, or even engineering of kernel functions $k(\mathbf{x},\mathbf{x'})$ to particular model at hand.

If using a completely Bayesian formulation (such as fitting using MCMC rather than maximum liklihood), one may incorporate additional prior knowledge on hyperparameters $\mathbf\theta$ if such knowledge is available.

Related Question