Efficient product of vector with inverse of positive definite circulant matrix plus real diagonal matrix

circulant-matricesgaussianinversematricespositive-semidefinite

Question:

Hi! Would anyone happen to know an efficient way to compute:

$$
u^\top [C + D]^{-1},
$$

where:

  • $C$ is a square matrix, and is real, positive definite, and circulant
  • $C$ is too large to fit in computer memory, and is instead defined by a single row $c$ (or its Fourier transform $\tilde c$)
  • $D$ is a real-valued diagonal matrix (and is not proportional to the identity matrix)
  • $u$ is a vector

(If $D$ is also circulant, then there is a fast solution via the fast Fourier transform. But, $D$ is not circulant my use case.)

"Efficient" here is deliberately vague, meaning "something I could actually compute on a laptop". Something less than $\mathcal O(n^3)$ would be nice. But, more important is the memory constraint: for the problem I'm working on it's just not possible to explicitly construct a matrix with size $C$, so $\mathcal O(n^2)$ space requirement just isn't feasible. I'd settle for an algorithm that only needs $\mathcal O(n)$ or $\mathcal O(n \log(n))$ memory ($n$ here is $\approx 10,000$), and takes at most a few minutes to run.

Context:

I'm working with a spatially-extended Gaussian process on a regular square grid. I have observations $y_i(\mathbf x_i)$ at each grid point $\mathbf x_i$, and I evaluate the posterior mean $\mu$ at these same grid points. The prior kernel is translationally invariant on a periodic domain. This kernel is evaluated on a uniformly-spaced grid. The prior covariance $C$ is therefore a real positive circulant matrix. The observation/measurement noise model consists of independent observations at each grid point $\mathbf x_i$ with noise $\sigma^2_i$ (i.e. the measurement noise is inhomogeneous). The observation covariance $D$ is therefore a diagonal real-valued matrix. In this case, the posterior covariance $\Sigma$ is given by

$$
\begin{aligned}
\Sigma &= C – C [C+D]^{-1} C
= [C^{-1} + D^{-1} ]^{-1}
\end{aligned}
$$

I want the marginal variance (or covariance) of the posterior along the vector(s) $a_1$, $a_2$. This is

$$
\begin{aligned}
\sigma^2_{a_1,a_2} = a_1^\top \Sigma a_2
\end{aligned}
$$

The grid is large, so $C$ (and therefore $C+D$) can't fit in memory. Thankfully, matrix-vector products can be quickly computed using the FFT if the matrix is circulant, so terms like $u^\top C$ are fast. If there were also a fast way to compute $u^\top [C + D]^{-1}$, then $\sigma^2_{a_1,a_2}$ could be computed quickly.

Conclusion:

I suspect that there might be no easy/fast way to compute this via ordinary matrix operations and the FFT, but perhaps there is some clever iterative algorithm that converges toward the right answer with acceptable time and space complexity.

Alternatively, since the underlying problem is related to a spatial Gaussian process, if the spatial correlations decay to zero at a certain distance, I could approximate the solution by working on smaller, localized domains. But something more formal (and maybe even exact) would be welcome.

Best Answer

Calculating $[C + D]^{-1} u$ can be re-cast as a solving for $x$ in the linear system $[C + D] x = u$. Although the matrix $C + D$ is too large to construct, one can define a quadratic error function with $x$ as the solution:

$$ \epsilon = \tfrac 1 2 \| [C+D] x - u \|^2 $$

We can find $x$ by minimizing this error via gradient descent. The gradient in $x$ is:

$$ \nabla_x \epsilon = [C+D]^\top \left( [C+D] x - u \right) $$

This can be computed using the FFT with relatively limited memory requirements. If $\tilde x = \mathcal F x$ is the Fourier transform of $x$, and $\tilde c$ is the Fourier transform of the kernel defining $C$, then:

$$ [C+D] x = Cx + Dx = \mathcal F^{-1} [\tilde c \circ \tilde x] + \operatorname{diag}[D]\circ x, $$

where $\circ$ denotes element-wise multiplication. Define the vector $\nu = [C+D] x - u$. The gradient is then $\nabla_x \epsilon = [C+D]^\top \nu = [C+D] \nu$. This matrix vector product can also be computed via the FFT as above.

This might not be especially fast, but in cases where $\Sigma$ can be decomposed into a sum special matrices with compact representations (e.g. Toeplitz, circulant, diagonal, sparse, etc), it provides a space-efficient alternative that does not require writing down a large linear system.

Related Question