Solved – Gaussian Process: Using partitions of a Cholesky decomposition solution for conditional deduction

cholesky decompositioncovariancegaussian processregression

If I define a GP over observed values, $y$, of a sensor reading over time, $t$, as (for simplicity assuming discrete time series e.g lets say readings after every 5 mins) :

$y=f(t)+\epsilon$
where $t=[{1\dots N}$] is time and $\epsilon=\mathcal{N}(0,\sigma_{y}^2)$ is a zero mean gaussian noise.

Then a GP model for $y$ is:
$y(t)=GP\Big(0,K(t,t)+\sigma_{y}^2I\Big)$.

If $L$ is the cholesky decomposition of the above covariance matrix $K(t,t)$ (size NxN), and $S$ is a $Nx1$ vector computed as:

$S=L^T$$\backslash$$(L $$\backslash$$y)$

(A computationally optimised form of computing $S=K^{-1}y$ using cholesky decomposition $L$)

My question is, if $S$ and $L$ are already computed and i want to find out $S_h$ and $L_h$ for only the first $h$ inputs i.e $t=[1 \dots h]$, where $h < n$

I know that given $L$, $L_h$ can be computed by partitioning $L$ as:

$L_h=\begin{pmatrix}
L_{1,1} \dots L_{1,h}\\
\vdots \ddots \vdots\\
L_{h,1} \dots L_{h,h}
\end{pmatrix}$ i.e. a $h x h$ size square partition of L (top left to be exact).

However, would the following be true?: (my math knowledge is a little rusty)

$S_h=L_h^T \backslash (L_h \backslash y_h)$
implying that:
$Sh=S(1 \dots h$) i.e a sub vector of $S$

The reason i ask this is because I want these to deduce the following without computing anything additional.

$p(f_h|y_h;\theta)$

where:

$f_h = f[1 \dots h]$

&

$y_h=y[1 \dots h]$.

where the model is trained on $y[1 \dots n]$.

This is my first post i might not have described the problem clearly enough so please bear with me, i would really appreciate your help and will try to clarify any questions/problems you have in understanding the problem i tried to describe.

Best Answer

So $S_h = L_h\setminus(L_h\setminus y_h)$ is true. However from the question it looks like you are actually wondering if $S_h = (L\setminus(L\setminus y))_h$ and this would not be true.

Super simple example:

$K = \left( \begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}\right)$, $y = \left(\begin{matrix} 3 \\ 4 \end{matrix}\right)$

$L =\left( \begin{matrix} 1 & 0 \\ 0.5 & 0.866 \end{matrix}\right)$

$L^T \setminus L \setminus y = \left( \begin{matrix} 1.33 \\ 3.33 \end{matrix}\right)$

So $(L^T \setminus L \setminus y)_h = 1.33$, where $h=1$

However, $K_h = 1, y_h = 3$ then $L_h = 1$ and $(L_h^T \setminus L_h \setminus y_h) = 3$.

In short you have to perform the inversion of cholesky factor again.

Related Question