[Math] GMM with full and diagonal covariances

matricesprobability distributionsstatisticsstochastic-analysisstochastic-calculus

I have Gaussian Mixture Model– distribution with probability density function, that is a weighted sum of Gaussian probability density functions:
\begin{equation}
p(X)=\sum_{i=1}^k \omega_i\mathcal{N}(X,\mu_i,\Sigma_i)=\sum_{i=1}^k \omega_ip_i(X),
\end{equation}

where $k$ is the number of components, $\mathcal{N}(X,\mu_i,\Sigma_i), i=1,…,k$ are Gaussian densities
with expectations (vectors) $\mu_i,i=1,…,k$ and covariance matrices $\Sigma_i,i=1,…,k$,

$\omega_i,i=1,…,k$ are weights: $\sum_{i=1}^k \omega_i=1.$

Covariance matrices $\Sigma_i,i=1,…,k$,are full — have correlation elements (non-zero non-diagonal elements).
How I can approximate this GMM via GMM with components with diagonal covariances. It is understood, that it will be more components in the weighted sum, but they will be diagonal.
Here on page 2 in is written, that it is possible (but without proof) :

https://www.ll.mit.edu/mission/cybersec/publications/publication-files/full_papers/0802_Reynolds_Biometrics-GMM.pdf

"It is also important to note that because the component Gaussian are
acting together to model the overall feature density, full covariance
matrices are not necessary even if the features are not statistically
independent. The linear combination of diagonal covariance basis Gaussians
is capable of modeling the correlations between feature vector elements.
The effect of using a set of M full covariance matrix Gaussians can be
equally obtained by using a larger set of diagonal covariance Gaussians. "

But how it can be done and what can be say if to compare cost of calculations for these 2 cases? Is it faster to use in calculations more components, but diagonal?
Thank you.

Best Answer

I don't know if this helps you. But the same claim has been made in

http://download.springer.com/static/pdf/237/art%253A10.1155%252FS1110865704310024.pdf?originUrl=http%3A%2F%2Fasp.eurasipjournals.springeropen.com%2Farticle%2F10.1155%2FS1110865704310024&token2=exp=1480612221~acl=%2Fstatic%2Fpdf%2F237%2Fart%25253A10.1155%25252FS1110865704310024.pdf*~hmac=37cc80cf0cee60b0efd6e74cc177540e8b4d1bc30c6e29a5771edc5a3e092ff9 (p. 435)

the exact passage is:

While the general model form supports full covariance matrices, that is, a covariance matrix with all its elements, typically only diagonal covariance matrices are used. This is done for three reasons. First, the density modeling of an Mth-order full covariance GMM can equally well be achieved using a larger-order diagonal covariance GMM.

with the explanation being:

GMMs with M > 1 using diagonal covariance matrices can model >distributions of feature vectors with correlated elements. Only in the degenerate case of M = 1 is the use of a diagonal covariance matrix incorrect for >feature vectors with correlated elements.