Solved – What should be the covariance matrices and weights for initializing EM/GMM with kmeans

expectation-maximizationgaussian mixture distributionmachine learning

It's typical to initialize EM for Gaussian Mixture Models using the result of kmeans clustering. However, kmeans only gives you the means (centers) of the starting GMM, but EM initialization often requires a complete GMM description (that is, including the covariance matrices and weights).

Therefore what is a 'good' way to come up with initial covariance matrices and weights for kmeans-based GMMs? Simply assign random values (assuming sum(weights)=1) ?

Best Answer

k-means also tells you which data points belong to which cluster. A good starting estimate for the covariances should be the within-cluster covariances, and a good estimate for the weights should be the fractions of data points allocated to each cluster.

Related Question