[Math] How to do hypothesis testing on Gaussian mixture model

statistics

I am CS major, please be patient if my question is not well-stated.

The dataset is quantitative mass spectrometry (MS) data. By labeling proteins of two different samples A and B, we get the relative abundance of 100 to thousands of proteins in A/B. Alongside with this ratio, we can estimate its variance based on the signal intensities.

Wanted: A list of proteins significantly different from the set of all protein ratios.

Most proteins remain unchanged between A and B. The population of log-ratios distributes around 1. The histogram shows a bell shape with fat tails. Two-term Gaussian mixture model has been found to provide a good fit to experimental noise. I suppose it would work good for this data – think of experimental and biological noise.

How to test for significantly different ratios on such a two-term Gaussian mixture model?

Thanks for your responses!

Best Answer

There is robust detection to test such outliers:=significantly different ratios. It is quite easy here because Gaussian mixture model is a sub model of epsilon contamination model and there exist least favourable densities to perform the likelihood ratio test.

To find such densities and accordingly the likelihood ratio test, you need to solve two non-linear equations and find two constants. Using these coefficients you get the least favourable densities and accordingly the likelihood ratio test.

The best thing here is that robust detection will provide you no loss of performance under outliers. Your performance can never degrade due to the boundedness property of the set of densities you consider around the nominal density.

It is also known that the likelihood ratio test based on least favourable densities is censored version of the original likelihood ratio test. I suggest to clip the likelihood ratio test and check the performance if you dont wanna deal with solving non linear equations.

Related Question