I have Gaussian Mixture Model– distribution with probability density function, that is a weighted sum of Gaussian probability density functions:
\begin{equation}
p(X)=\sum_{i=1}^k \omega_i\mathcal{N}(X,\mu_i,\Sigma_i)=\sum_{i=1}^k \omega_ip_i(X),
\end{equation}
where $k$ is the number of components, $\mathcal{N}(X,\mu_i,\Sigma_i), i=1,…,k$ are Gaussian densities
with expectations (vectors) $\mu_i,i=1,…,k$ and covariance matrices $\Sigma_i,i=1,…,k$,
$\omega_i,i=1,…,k$ are weights: $\sum_{i=1}^k \omega_i=1.$
Covariance matrices $\Sigma_i,i=1,…,k$,are full — have correlation elements (non-zero non-diagonal elements).
How I can approximate this GMM via GMM with components with diagonal covariances. It is understood, that it will be more components in the weighted sum, but they will be diagonal.
Here on page 2 in is written, that it is possible (but without proof) :
"It is also important to note that because the component Gaussian are
acting together to model the overall feature density, full covariance
matrices are not necessary even if the features are not statistically
independent. The linear combination of diagonal covariance basis Gaussians
is capable of modeling the correlations between feature vector elements.
The effect of using a set of M full covariance matrix Gaussians can be
equally obtained by using a larger set of diagonal covariance Gaussians. "
But how it can be done and what can be say if to compare cost of calculations for these 2 cases? Is it faster to use in calculations more components, but diagonal?
Thank you.
Best Answer
I don't know if this helps you. But the same claim has been made in
http://download.springer.com/static/pdf/237/art%253A10.1155%252FS1110865704310024.pdf?originUrl=http%3A%2F%2Fasp.eurasipjournals.springeropen.com%2Farticle%2F10.1155%2FS1110865704310024&token2=exp=1480612221~acl=%2Fstatic%2Fpdf%2F237%2Fart%25253A10.1155%25252FS1110865704310024.pdf*~hmac=37cc80cf0cee60b0efd6e74cc177540e8b4d1bc30c6e29a5771edc5a3e092ff9 (p. 435)
the exact passage is:
with the explanation being: