Gaussian process smoothing with eigen-decomposition

eigenvalues-eigenvectorsgaussianmatrix decomposition

From the book 'Gaussian Process for Machine learning', the author talked about the smoothing w.r.t eigen-decomposition. The prediction $f$ at training points can be written as
$$f=K(K+\sigma_n^2I)^{-1}y$$
where $K$ is the covariance matrix and $\sigma_n$ is the noise variance. Note that this is the predictions for training points.

Then the author use eigen-decomposition on $K$ such that $K=\sum^n_{i=1}\lambda_iu_iu_i^T$ where $n$ is the number of training points. Then, let the responses linear combined by $u_i$ such that $y=\sum^n_{i=1}\gamma_iu_i$. Next, the author gives the formula
$$f=\sum^n_{i=1}\frac{\gamma_i\lambda_i}{\lambda_i+\sigma_n^2}$$

I don't get how can the outer product $u_iu_i^T$ cancel out. I tested the result and it is indeed correct. Any one knows the detail?

Best Answer

The version that I found has a different formula for $f$. Equation (2.36) in Section 2.6 (Smoothing, Weight Functions and Equivalent Kernels) states the following:

\begin{equation} f = \sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{\lambda_i + \sigma_n^2}u_i. \end{equation} This makes sense, as $K(K + \sigma_n^2I)^{-1}$ is a matrix and $y$ is a vector, so $f$ should be a vector. I will show how to get this formula.


Since $K$ is a real covariance matrix, it is a positive-definite real matrix, which guarantees that it is orthogonally diagonalizable. This means that the $u_i$ in the expression \begin{equation} K = \sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}} \end{equation} form an orthonormal basis of $\mathbb{R}^n$.

\begin{eqnarray} f &=& K(K + \sigma_n^2I)^{-1}y\\ &=& \left(\sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}}\right)\left(\sum_{j=1}^{n}\lambda_j u_j u_j^{\mathsf{T}} + \sigma_n^2 I\right)^{-1}\left(\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\right)\\ &=& \left(\sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}}\right)\sigma_n^{-2}\left( \sigma_n^{-2}\sum_{j=1}^{n}\lambda_j u_j u_j^{\mathsf{T}} +I\right)^{-1}\left(\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\right) \end{eqnarray}

Recall the Taylor-MacLaurin series for $1 - x$: \begin{equation} 1 - x = \sum_{n=0}^{\infty}x^n~~\textrm{for}~~|x| < 1. \end{equation} Then \begin{equation} 1+ x = \sum_{n=0}^{\infty}(-x)^n~~\textrm{for}~~|x| < 1. \end{equation} An analogous formula can be used for the square-matrix analogue: \begin{equation} I + A = \sum_{m=0}^{\infty}(-A)^m~~\textrm{for}~~\left\Arrowvert A\right\Arrowvert < 1 \end{equation} or if $A^N = O$, the zero matrix, for some $N$. This latter condition cannot hold for a positive-definite matrix.

Let's assume that this will work for $(\sigma_n^{-2}K + I)^{-1}$: \begin{equation} \left(I + \sigma_n^{-2}K\right)^{-1} = \sum_{m=0}^{\infty}(-\sigma_n^{-2})^mK^m. \end{equation}

A valuable thing to notice about orthogonally diagonalizable matrices such as $K$ is that \begin{equation} K^m = \left(\sum_{i=1}^{n}\lambda_i u_iu_i^{\mathsf{T}}\right)^m = \sum_{i=1}^{n}\lambda_i^m u_iu_i^{\mathsf{T}}. \end{equation} I'll let you verify that. Use the fact that the $u_i$ form an orthonormal basis.

Now we can write $f$ as \begin{eqnarray} f &=& \sigma_n^{-2} \color{red}{K}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{K^m}\right]y\\ &=& \sigma_n^{-2}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{K^{m+1}}\right]y\\ &=& \sigma_n^{-2}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{\sum_{i=1}^{n}\lambda_i^{m+1}u_i u_i^{\mathsf{T}}}\right]\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\\ &=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\color{red}{\sum_{i=1}^{n}\lambda_i^{m+1}u_i}\sum_{\ell=1}^{n}\gamma_{\ell}\color{red}{u_i^{\mathsf{T}}}u_{\ell}\right] \end{eqnarray}

Since the $u_i$ are orthonormal, \begin{equation} u_i^{\mathsf{T}}u_{\ell} = \delta_{i,\ell}, \end{equation} the Kronecker delta function. When we sum over $\ell$, the only term that "survives" will be the term for which $\ell = i$.

\begin{eqnarray} f &=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\sum_{i=1}^{n}\lambda_i^{m+1}u_i\sum_{\ell=1}^{n}\gamma_{\ell}\delta_{i,\ell}\right]\\ &=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\sum_{i=1}^{n}\gamma_i\lambda_i^{m+1}u_i\right]\\ &=& \sigma_n^{-2}\sum_{i=1}^{n}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\lambda_i^{m}\right]\lambda_i\gamma_i u_i\\ &=& \sigma_n^{-2}\sum_{i=1}^{n}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2}\lambda_i)^m\right]\lambda_i\gamma_i u_i\\ &=& \sigma_n^{-2}\sum_{i=1}^{n}\left[(1 + \sigma_n^{-2}\lambda_i)^{-1}\right]\lambda_i\gamma_i u_i\\ &=& \sigma_n^{-2}\sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{1 + \sigma_n^{-2}\lambda_i}u_i\\ &=& \sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{\sigma_n^{2} + \lambda_i}u_i \end{eqnarray}


Upon questioning from @yiming-che, I realized an easier approach. This approach is easy to see because $K$ is orthogonally diagonalizable. First, note that since the $u_i$ form an orthogonormal basis for $\mathbb{R}^n$, $I = \sum_{i=1}^{n}u_iu_i^{\mathsf{T}}$. This gives us a simpler expression for $I + \sigma_n^{-2}K$.

\begin{eqnarray} \left(I + \sigma_n^{-2}K\right)^{-1} &=& \left(\sum_{i=1}^{n}u_iu_i^{\mathsf{T}} + \sum_{i=1}^{n}\sigma_n^{-2}\lambda_i u_iu_i^{\mathsf{T}}\right)^{-1}\\ &=& \left(\sum_{i=1}^{n}(1 + \sigma_n^{-2}\lambda_i)u_iu_i^{\mathsf{T}}\right)^{-1} \end{eqnarray}

I claim that the last line can be re-written as \begin{equation} \left(\sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)u_iu_i^{\mathsf{T}}\right)^{-1} = \sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}. \end{equation} Let's prove it by multiplying the original matrix and the suspected inverse.

\begin{eqnarray} &&\left[\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)u_ju_j^{\mathsf{T}}\right]\left[\sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}\right]\\ &=& \sum_{i=1}^{n}\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_j~\underbrace{u_j^{\mathsf{T}}u_i}_{\delta_{i,j}}~u_i^{\mathsf{T}}\\ &=& \sum_{i=1}^{n}\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_j\delta_{i,j}u_i^{\mathsf{T}}\\ &=& \sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}\\ &=& \sum_{i=1}^{n}u_iu_i^{\mathsf{T}}\\ &=& \end{eqnarray} This works as long as $1 + \sigma_n^{-2}\lambda_i \neq 0$ for each $i$.

Related Question