Hypothesis Testing – Statistical Significance Test for Two Poisson Distributions

hypothesis testingnonparametricpoisson distribution

Say that I have two Poisson distributions. These were modelling after count data.

How would I determine statistical significance between these two distributions? That is, how would I determine whether these two poisson distributions are statistically different?

Could I apply any non-parametric test (because they don't assume anything about the distribution of the data)? A simple Google search doesn't seem to provide direct answers.

Best Answer

Note that Poisson distributions are entirely determined by their parameter, so a test of equality of their mean parameter is a test for whether the distributions are the same.

Some possible tests:

If you have two samples which you treat as iid Poisson each with its own parameter, which you want to test for equality of that parameter; in that case you can simply combine all the observations in each group into a single Poisson count.

a. You could condition on the total count and do a test of proportions (a binomial test in exact form, or via normal approximation, or equivalently a chi-squared test). For example, this binomial test is what you get if you do poisson.test on two samples in R.

b. You could do a likelihood ratio test.

(There are a number of other possibilities under this option.)
If you don't necessarily want to treat them as Poisson except as a rough approximation (but do treat them as iid), you would keep all the individual values.

a. You could then do a permutation test of the means.

b. You could do a Wilcoxon-Mann-Whitney or even a goodness of fit test (e.g. a Kolmogorov-Smirnov test) but you will have to deal with the discreteness of the distributions.

c. If you expect that the means won't be very small, you could perform (say) a t-test (under the null the samples should have equal variance, so it's not important whether you do the equal-variance version).
If instead of being identically distributed, they are of known but different exposures, you could combine into single counts as in option 1, but also combine the exposures into a single exposure for each. You could then follow the approaches in 1.
If they have unknown exposure but the exposures of pairs of observations will be the same, you effectively have pairing. You could perform a paired permutation test -- permuting the group labels within each pair (which corresponds to putting + and - signs on each absolute pair difference of counts). You could also do a sign test, or since (under the null) the differences would be symmetric you could consider a signed rank test (again properly accounting for ties).

Related Solutions

Solved – Poisson distribution and statistical significance

There are two points to make:

It is not the specific value of 130 that is unusual, but that it is much larger than 100. If you got more than 130 hits, that would have been even more surprising. So we usually look at the P(X>=130), not just P(X=130). By your logic even 100 hits would be unusual, because dpois(100,100)=0.04. So a more correct calculation is to look at ppois(129, 100, lower=F)=0.00228. This is still small, but not as extreme as your value. And this does not even take into account, that an unusually low number of hits might also surprise you. We often multiply the probability of exceeding the observed count by 2 to account for this.
If you keep checking your hits every day, sooner or later even rare events will occur. For example P(X>=130) happens to be close to 1/365, so such an event would be expected to occur once a year.

Poisson Distribution in R – Simple Comparison of Two Poisson Means

This is what I would do which I believe it is what @Dave is hinting at in comment.

Fit GLM with Poisson familiy distribution and see the effect of "condition" on the mean of each group.

Prepare the dataset:

Cond1 <- c(0, 0, 0, 1, 1)
Cond2 <- c(1, 2, 3, 3, 4)

dat <- data.frame(
    cond= c(rep('C1', length(Cond1)), rep('C2', length(Cond2))),
    count= c(Cond1, Cond2)
)

Fit the model and assess the significance of "condition":

fit <- glm(count ~ cond, data= dat, family= 'poisson')
summary(fit)

Call:
glm(formula = count ~ cond, family = "poisson", data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.13533  -0.89443  -0.07296   0.65703   0.80391  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.9163     0.7071  -1.296   0.1950  
condC2        1.8718     0.7595   2.464   0.0137 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 14.8823  on 9  degrees of freedom
Residual deviance:  5.8682  on 8  degrees of freedom
AIC: 27.731

Number of Fisher Scoring iterations: 5

So the mean of Cond 1 is estimated as exp(-0.9163) = 0.4, the mean of Cond 2 is exp(-0.9163 + 1.8718) = 2.6 and the difference has p = 0.0137 for the null hypothesis of being zero.

Best Answer

Related Solutions

Solved – Poisson distribution and statistical significance

Poisson Distribution in R – Simple Comparison of Two Poisson Means

Related Question