Hypothesis Testing – Statistical Significance Test for Two Poisson Distributions

hypothesis testingnonparametricpoisson distribution

Say that I have two Poisson distributions. These were modelling after count data.

How would I determine statistical significance between these two distributions? That is, how would I determine whether these two poisson distributions are statistically different?

Could I apply any non-parametric test (because they don't assume anything about the distribution of the data)? A simple Google search doesn't seem to provide direct answers.

Best Answer

Note that Poisson distributions are entirely determined by their parameter, so a test of equality of their mean parameter is a test for whether the distributions are the same.

Some possible tests:

  1. If you have two samples which you treat as iid Poisson each with its own parameter, which you want to test for equality of that parameter; in that case you can simply combine all the observations in each group into a single Poisson count.

    a. You could condition on the total count and do a test of proportions (a binomial test in exact form, or via normal approximation, or equivalently a chi-squared test). For example, this binomial test is what you get if you do poisson.test on two samples in R.

    b. You could do a likelihood ratio test.

    (There are a number of other possibilities under this option.)

  2. If you don't necessarily want to treat them as Poisson except as a rough approximation (but do treat them as iid), you would keep all the individual values.

    a. You could then do a permutation test of the means.

    b. You could do a Wilcoxon-Mann-Whitney or even a goodness of fit test (e.g. a Kolmogorov-Smirnov test) but you will have to deal with the discreteness of the distributions.

    c. If you expect that the means won't be very small, you could perform (say) a t-test (under the null the samples should have equal variance, so it's not important whether you do the equal-variance version).

  3. If instead of being identically distributed, they are of known but different exposures, you could combine into single counts as in option 1, but also combine the exposures into a single exposure for each. You could then follow the approaches in 1.

  4. If they have unknown exposure but the exposures of pairs of observations will be the same, you effectively have pairing. You could perform a paired permutation test -- permuting the group labels within each pair (which corresponds to putting + and - signs on each absolute pair difference of counts). You could also do a sign test, or since (under the null) the differences would be symmetric you could consider a signed rank test (again properly accounting for ties).

Related Question