Normal Distribution – Marginalizing High-Dimensional Multivariate Gaussian Distributions

marginal-distributionmultivariate distributionnormal distribution

I have an 11-dimensional multivariate Gaussian, with a covariance matrix with non-zero values in every element. My goal is to marginalize this down to 4 dimensions, but I'm having some computational issues running these 7 integrals over the 11D Gaussian (time and accuracy – I have a lot of runs to make, and it needs to be accurate).

I've seen in various sources that to marginalize over a Gaussian, you simply need to take the subset of the covariance matrix / mean vector corresponding to which variables you want to end up with. For example, if you wanted to marginalize a 4d gaussian over $x_2$ and $x_4$, your new covariance matrix would be:

$\Sigma'=\pmatrix{\Sigma_{11} & \Sigma_{13}\\\Sigma_{31} & \Sigma_{33}}$

Which could then be used to generate a new 2-dimensional Gaussian over $x_1$ and $x_3$.

Is this true? I'm not exactly seeing how this is possible, unless of course there are no covariances between variables you want to keep, and variables you are marginalizing over. I ran a quick example, using the following 4d covariance matrix:

$\Sigma=\pmatrix{2 &0&3&0\\0&4&0&0\\0&0&3&0\\6&0&0&5}$

If I want to marginalize again over $x_2$ and $x_4$, my new covariance matrix is then:

$\Sigma'=\pmatrix{2&3\\0&3}$

(Note we lose all information about the 6 in the lower left corner!) Now, I can compare this new Gaussian over $x_1$ and $x_3$ to my same result as if I had integrated along $x_2$ and $x_4$, rather than just taking a subset of the covariance matrix, and here are the 90% contours:

1]

They definitely do not agree, as some on the internet claim they might! However if I redo the example using a covariance matrix that only has covariances among $x_1$ and $x_3$, the results are identical..

So am I right in my guess that the "covariance matrix subset" rule is only valid for covariance matrices with covariances among the resulting variables?

And if so, are there any other simple methods to marginalize over high-dimensional gaussians that won't kill my time or precision?

Thanks so much in advance!

Best Answer

Your 4D covariance matrix is actually not a covariance matrix firstly because it is not symmetric. That is why you lose $6$ in the lower left corner. It also should be in the upper right corner. Replicate your experiments with this in mind. Also, keep your covariance matrix positive semi-definite.

Due to the definition of MV Gaussian, every subset of the Gaussian RVs inside are multivariate, and you can construct the covariance and mean as you read in your sources, or here.