Solved – How to extract components of Gaussian mixture

gaussian mixture distributionMATLAB

I'm trying to model a dataset as a mixture of two Gaussian distributions in MATLAB and find the Bhattacharyya distance between the two. Using MATLAB's fitgmdist function I was able to model this mixture and produce this plot: plot

gmdist = fitgmdist(data, 2);
gmsigma = gmdist.Sigma;
gmmu = gmdist.mu;
gmwt = gmdist.ComponentProportion;

histogram(data, 'Normalization', 'pdf', 'EdgeColor', 'none')
x = min_val:0.0001:max_val;
xlim([min_val max_val])
hold on;
plot(x, pdf(gmdist, x'), 'k')
hold on;

However, when I was debugging my distance code I realized that my two individual distributions did not match their components in the mixture. plot2

p = pdf('Normal', x, gmmu(1), gmsigma(1));
q = pdf('Normal', x, gmmu(2), gmsigma(2));
plot(x, p*gmwt(1))
hold on;
plot(x, q*gmwt(2))

My expectation was that the above code should have produced two density plots that matched their components in the plot of the original mixture model, but this is not the case. This post clued me in that the PDFs I calculated integrated to 1 individually (rather than together), but I'm unsure how to obtain the "non-integrated" components of the mixture model such that I can plot them to match the original plot.

Best Answer

There is definitely a mistake in the implementation of the decomposition. Here is my rendering of the same problem with both weighted components appearing as they should:

enter image description here

My R code is as follows:

hist(x,nclass=22,col="wheat2",bord=FALSE,prob=TRUE,main="") curve(.7*dnorm(x)+.3*dnorm(x,3),col="sienna",add=TRUE,lwd=2) curve(.3*dnorm(x,3),col="steelblue",add=TRUE,lwd=2,lty=2) curve(.7*dnorm(x),col="steelblue",add=TRUE,lwd=2,lty=2)