Solved – “Better” goodness-of-fit tests than chi squared for histogram modeling

chi-squared-testfittinghypothesis testingmodel selectionpoisson distribution

I work on data from a mass spectrometer that produces billions upon billions of count histograms, and I need a good way to test whether these histograms are consistent with one or several model distributions (Gaussian, heavy-tailed, multimodal, etc). Outliers may be present in a good fraction of the histograms, if not all. The histograms may have anywhere from 0 to 10^6 counts in them, and they come to us already discretized, so the histogram is not losing any info with respect to the original observations.

As a naive jack-of-all-trades data analyst trained by physicists, my instinct is to do something like the following:

For each model distribution,

  • estimate its parameters using moments or nonlinear fitting using the Poisson likelihood (since this is count data, each bin is a Poisson random variate)
  • calculate the $\chi^2$ of the data vs. the fitted distribution

Then, with the chi squared values from the several models in hand…

  • pick the model with the best $\chi^2$ value
  • if $\chi^2$ is too big (as referenced against the theoretical $\chi^2$ distribution with the appropriate degrees of freedom), flag the distribution as deviating significantly from the model.

I was curious if more experienced statisticians could advise me on whether this procedure makes sense, limitations I might encounter, better alternatives, etc. Here are a couple things I'm wondering about:

  • For histograms with few counts, I feel like it more sense to use the Poisson likelihood / Kullback-Leibler divergence in the goodness-of-fit metric rather than the sum of squares used in the $\chi^2$ test statistic. It's most appropriate to use it in the fitting, why not also in the test? But I don't know any commonly-used test that works this way. I googled around for Poisson histogram goodness-of-fit tests and found nothing.
  • I have the vague sense that I should use some AIC type thing to account for the number of parameters in the distribution, but maybe that's already rolled into the $\chi^2$ degrees of freedom.

Best Answer

I'm going to venture an answer to my own question after some googling. One simple approach is to use binned Poisson maximum likelihood ratios. See p. 94-96 of this page:

http://www.hep.phy.cam.ac.uk/~thomson/lectures/statistics/FittingHandout.pdf

The likelihood ratio converges to a $\chi^2$ distribution in the large count limit, and if you're dealing with very few counts, you should do MC simulations to determine the empirical distribution of the likelihood ratio under the hypothesis that your histogram does indeed represent a collection of samples from the model distribution. You can determine a $p$-value from this simulated likelihood ratio distribution, and the $\chi^2$ test just represents a fast analytical approximation to this which is applicable in the large count limit.

All of this is nothing more than following the "perennial philosophy" of frequentist statistics:

  1. If you think something interesting happened, you should find out how often you'd expect that thing to happen by chance before you go proclaiming to the world that it's interesting.

  2. If the $p$-value shows that random effects are almost surely not responsible for the difference between your model and your observation, then your model is probably wrong.

Related Question