Positive semi-definitness of modified RBF Kernel

cholesky decompositioncovariancegaussianlinear algebrapositive definite

Let's say I have an imaging space made of 100×100 pixels and I want to make a covariance matrix using RBF kernel (Gaussian kernel). In other words, say the covariance matrix is $C\in\mathbf R^{10000\times10000}$, then the $i,j$th element is calculated using an RBF kernel,
$$C_{i,j} = \exp{\left(-\frac{\lvert{r_{i}-r_{j}}\rvert_2^2}{2\sigma^2}\right)}$$

where, $r_i$, and $r_j$ are the vector location of the $i$th and $j$ th pixels, respectively and $\lvert \cdot \rvert_2^2$ is the squared Euclidean norm (l2 norm).

After constructing the covariance matrix, I want to perform Cholesky decomposition on the covariance matrix. Since the covariance matrix is positive semidefinite, it is possible to do Cholesky decomposition, but the only problem is that the time complexity of Cholesky decomposition is $O(n^3)$, and therefore, it doesn't scale very well with large number of pixels.

The trick I want to implement is to set the "weak" covariances to zeros, and make the covariance matrix sparse symmetric matrix, on which it is easier to perform Cholesky decomposition. In other words, the modifies RBF kernel would look like,
$$
C_{i,j} =
\begin{cases}
\exp{\left(-\frac{\lvert{r_{i}-r_{j}}\rvert_2^2}{2\sigma^2}\right)}, & \text{if } \lvert{r_{i}-r_{j}}\rvert_2^2 < \epsilon \\
0 & \text{otherwise}
\end{cases}
$$

where, $\epsilon$ is determined from the desired sparseness of the covariance matrix.

I've implemented this in Python, but it seems that setting the low covariance entries to zero makes the covariance matrix no longer positive semi-definite. And I can't find the reason why. I've looked into the eigenvalues of the modified covariance matrix, and indeed there are some negative eigenvalues, which makes Cholesky decomposition impossible.

Is there any good explanation for this? Thank you.

Best Answer

First: Congratulations you checked! Many just do this truncation without checking.

But why would you expect this modified "kernel" to be positive definite? It is well known that the Eigenvalues of the Squared Exponential Kernel decrease rapidly to zero when the size of the matrix increases. This makes it not entirely unplausible that even small perturbations such as nulling out some small entries may lead to negative Eigenvalues. Especially considering the cumulative impact of nulling out a lot of small entries in total. In fact I am somewhat surprised that you did not run into trouble considering the original matrix. Only due to numerical inaccuracies you may come up with some negative Eigenvalues in a theoretically positive definite matrix.

Positive definite is subtle and easily destroyed by modification - as you found out. This is evidenced further by the fact that it is actually quite difficult to come up with radial basis functions which have compact support. But there are some, they are called "Wendland functions" or "Wendland kernels". Arguably the most simple one of those is the "tent" or "hat" kernel $$ K(x,y) = \max\left( 1 - \frac 1 \gamma r(x,y),0\right)^l$$ which is positive definite for $l\ge \lfloor \frac d 2\rfloor +1$, where $r$ is the euclidean distance between $x$ and $y$ from $\mathbb{R}^d$ and $\gamma$ a scaling factor. Wendland's book Chapter 9 or this link provide more information.