MATLAB: Do I receive an error while trying to generate the Gaussian mixture parameter estimates from a dataset using the GMDISTRIBUTION.FIT function in Statistics Toolbox 7.1 (R2009a)

Statistics and Machine Learning Toolbox

I am trying to generate the Gaussian mixture parameter estimates from a dataset using the GMDISTRIBUTION.FIT function in Statistics Toolbox 7.1 (R2009a)by executing the following line of code.
obj = gmdistribution.fit(testdata,2);
Here, 'testdata' is the dataset that I load from a MAT file.
However, after executing the above line of code, I receive the following error message.
??? Error using ==> gmcluster at 181
Ill-conditioned covariance created at iteration 53.
Error in ==> gmdistribution.fit at 199
[S,NlogL,optimInfo] =...
Error in ==> test_stats at 2
obj = gmdistribution.fit(testdata,2);
My intention is to fit a 2 component Gaussian mixture model to this dataset.

Best Answer

This error message is expected because the GMDISTRIBUTION.FIT function is not able to fit a 2 component Gaussian mixture model to the dataset in Statistics Toolbox 7.1 (R2009a).
As a general guideline to Gaussian Mixture Model(GMM) fitting, the GMDISTRIBUTION function may converge to a solution where one or more of the components has an ill-conditioned or singular covariance matrix.
The following issues may result in an ill-conditioned covariance matrix:
1. The number of dimensions of your data is relatively high and there are not enough observations.
2. Some of the features (variables) of your data are highly correlated.
3. Some or all the features are discrete.
4. You tried to fit the data to too many components.
In general, you can avoid getting ill-conditioned covariance matrices by using one of the following precautions:
1.Pre-process your data to remove correlated features.
2.Set 'SharedCov' to true to use an equal covariance matrix for every component.
3.Set 'CovType' to 'diagonal'.
4.Use 'Regularize' to add a very small positive number to the diagonal of every covariance matrix.
5.Try another set of initial values.