Solved – Number of free parameters in Gaussian mixture models

aicbicgaussian mixture distributionmodel comparisonmodel selection

When comparing GMM models with different number of components (i.e number of Gaussians) one penalizes the likelihood for the total number of free parameters in the mixture model. If the data is in $D$ dimension then the number of free parameters for $J$ components is given as:

$J-1$: for $J$ weights which sum to one

$D$: for each mean

$D(D+1)/2$: for each covariance matrix

My question is this: assuming that I am working in a limited space in $D$ and I do not need infinite precision, if I discretize my space into grid points instead of referring to each mean location with its coordinate vector of $D$ elements can't I just use one parameter as an index to define the location? It would work only for the mean vector but nevertheless it would be a reduction of complexity.

Best Answer

The grid mapping you describe is a form of index: with finite precision, there are a fixed number of values that the mean of each component can have, so you can reduce any mean vector to a scalar pointer to the index. You can do the same thing with the covariance matrices: if the values have a finite precision, then only a finite number of different matrices are possible; these can be indexed, then the $D(D+1)/2$ parameters could be replaced with a single index. In fact, with finite precision, there are only a finite number of mixture models possible, so in principle, these could be indexed, reducing all the parameters to a single scalar.

However the information contained such an index must logically be the same as in the full set of parameters, so you're not really reducing the complexity of the model. If you're comparing two models using some information criteria (AIC, BIC, MDL etc.) then reducing the precision of all the models will not change the ranking of their relative goodness-of-fit.

Nice idea though!

Related Question