# Solved – Equivalence of gaussian process and bayesian linear regression by inspecting the covariance matrix

covariance-matrixgaussian processlinear algebra

I'm aware that a gaussian process is equivalent to bayesian linear regression for the kernel $$K(x_i,x_j) = x_i x_j$$ (assume scalar $$x$$ here). However, the proof itself didn't lend much intuition to me.

If I imagine sampling a function from the linear GP as sampling one point from an uncountable-dimension gaussian RV with covariance matrix $$K$$ (I'm aware this isn't mathematically rigorous, but bear with me), it is very unintuitive to me why all the points should lie on a line — why the function should be linear.

All I know about this "covariance matrix" $$K = xx^T$$ (where $$x$$ is a vector containing all the real numbers) is that the rank is 1, and that it's symmetric. I should be able to diagonalize it as $$K = Q^T\Lambda Q$$ with all the eigenvalues on the diagonal of $$\Lambda$$. Since the rank is 1, it should have one non-zero eigenvalue which I can force to be the top-left entry simply by permuting rows/cols of $$Q$$ and $$\Lambda$$. So now I can imagine sampling with covariance $$\Lambda$$, and then applying rotation $$Q^T$$.

If I fix $$f(0)$$ as the "first" dimension of our uncountable gaussian, this means I can sample $$f(0)$$ from some univariate gaussian, and then $$f(x) = 0$$ for all other $$x$$, since all the other entries of $$\Lambda$$ are 0 and the mean is 0.

This definitely doesn't look like a linear function to me — it looks like a constant function with a discontinuity at 0. Furthermore, I'm not sure how rotation $$Q$$ affects the function (surely, it doesn't correspond to rotating a plot of the function on the 2D plane).

I think I've gone wrong with the math somewhere, so the question is: is there a way to show that a rank-1 kernel for GP corresponds to linear functions? What about rank-2, does it correspond to quadratic functions?

Deducing from your question, you are considering a Bayesian linear regression model $$f(\boldsymbol{x}) = \boldsymbol{w}^{T}\boldsymbol{x}$$ with a standard Gaussian prior $$\boldsymbol{w} \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{1}_{\{d \times d\}})$$. As you correctly remark, this is equivalent to the function space view where we specify a prior directly on $$f$$ as $$f \sim \mathcal{GP}(0, K)$$ where we have the linear kernel $$K(\boldsymbol{x}, \boldsymbol{x}')=\boldsymbol{x}^{T}\boldsymbol{x}'$$.
Let's think about how we produce a random draw of the function evaluated on some fixed set of points that we collect into a matrix $$\boldsymbol{X}_{*} \in \mathbb{R}^{m \times d}$$.
1. $$\textit{Bayesian Linear Regression}$$: Here it is pretty straight-forward, just sample one realization $$\boldsymbol{w} \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{1}_{\{d \times d\}})$$ and we easily find the corresponding function values $$\boldsymbol{f}_{*} = \boldsymbol{X}_{*}\boldsymbol{w} \in \mathbb{R}^{m}$$ Here it is as expected, all points lie in the same plane determined by $$\boldsymbol{w}$$.
2. $$\textit{Function Space View}$$: This is slightly more annoying. As you correctly say, to obtain a random draw of prior function values $$f_{*}$$ at $$\boldsymbol{X}_{*}$$, we need to sample $$f^{*} \sim \mathcal{N}(\boldsymbol{0}, K(\boldsymbol{X}_{*}, \boldsymbol{X}_{*})) \stackrel{(d)}{=}\mathcal{N}(\boldsymbol{0}, \boldsymbol{X}_{*}\boldsymbol{X}_{*}^{T})$$ The question is now: $$\textit{Why should those samples lie in a plane?}$$ As you remarked, the covariance $$\boldsymbol{C}=\boldsymbol{X}_{*}\boldsymbol{X}_{*}^{T}$$ is degenerate as it is of rank $$k=min(d, m)$$. Let us assume that $$m>d$$, otherwise there is always a plane of dimension $$d-1$$ that contains the $$m$$ points. So we have $$k=d$$. As a consequence of the low rank structure, we do not get the "full" randomness but only one that corresponds to the push-forward of a lower dimensional random variable. Let's make this low-dimensional randomness more explicit before giving the final answer. Due to positive semi-definiteness, we diagonalize as $$\boldsymbol{C} = \boldsymbol{V}\boldsymbol{\Lambda}\boldsymbol{V}^{T}$$ where $$\boldsymbol{V} \in \mathbb{R}^{m \times m}$$ and $$\boldsymbol{\Lambda} \in \mathbb{R}^{m \times m}$$ with only the first $$d$$ diagonal entries non-zero. Take a random variable $${\boldsymbol{\tilde{z}}} \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{1}_{\{d \times d\}})$$ and pad it with $$m-d$$ zeros to get $$\boldsymbol{z} \in \mathbb{R}^{m}$$. We can then rewrite $$f^{*}\stackrel{(d)}{=} \boldsymbol{V}\boldsymbol{\Lambda}^{\frac{1}{2}}\boldsymbol{z}$$ due to the formula $$\text{cov}(\boldsymbol{A}\boldsymbol{z}) = \boldsymbol{A}\boldsymbol{A}^{T}$$ and invariance of the Gaussian distribution under linear transformations. We thus see that the randomness only stems from a $$d$$-dimensional random variable $$\boldsymbol{z}$$. Now to the final answer, we also have due to the same facts as above that $$f_{*} \stackrel{d}{=}\boldsymbol{X}_{*}\boldsymbol{\tilde{z}}$$ We hence see that $$f_{*}$$ also lives in a $$d$$-dimensional plane.