Poisson Test with Multiple Replicates Against Null Hypothesis Using R

hypothesis testingpoisson distributionrstatistical significance

I feel like this should be a pretty basic question, but I can't find a clear answer.

I have multiple independent replicates of the same thing. Lets say I went and counted cars along a road between 8am and 9am, every Monday for a month. The assumption is those counts are all sampled from the same distribution. I counted 9, 10, 8, and 9 cars.

Let's further say the City Traffic Department publishes their modeled traffic rates. They think the road should have 10 cars per hour.

How do I test whether or not my counted observations match that prediction?

For any one count (e.g. a count of 9), it's trivial using R:
poisson.test(x = 9, T = 1, r = 10)

But how do I use the repeated counts?

I did think I could take the average of the four counts. As this is a Poisson distribution, it is fully described by the mean alone. In this case the mean is 9, so the result would be the same as above.

But it seems wrong that repeating counts and increasing my sample size has no impact at all on the conclusion. Let's say the true mean is 5 cars per hour – I could count every morning all year, reach a mean of 5, and still the simple poisson test would conclude there is not enough evidence to reject the null hypothesis that the mean is 10. Surely increasing my sample size increases power?

Best Answer

You would pass in the sum of your observations, and set the T parameter in poisson.test to the number of days you have taken samples:

#pretend these are your data points
x <- 8:16
poisson.test(sum(x),T=length(x),r=10)

# >         Exact Poisson test
# > 
# > data:  sum(x) time base: length(x)
# > number of events = 108, time base = 9, p-value = 0.0647
# > alternative hypothesis: true event rate is not equal to 10
# > 95 percent confidence interval:
# >   9.843833 14.488079
# > sample estimates:
# > event rate 
# >         12

Upping the T parameter reduces the variance of the overall estimate. See for example if we take the mean rate instead (and pass T=1):

poisson.test(mean(x),T=1,r=10)

# >         Exact Poisson test
# > 
# > data:  mean(x) time base: 1
# > number of events = 12, time base = 1, p-value = 0.5234
# > alternative hypothesis: true event rate is not equal to 10
# > 95 percent confidence interval:
# >   6.200575 20.961585
# > sample estimates:
# > event rate 
# >         12

So repeating the counts you gain precision in your estimate.

Since you can see that this only works with the overall sum of the observations, you will also likely want to check and see that the individual observations approximately look like a Poisson distribution as well (e.g. if everyday is exactly 10, then it is not Poisson).

I have a package I am working on, ptools, where you can see the Poisson fit to data using the check_pois function.

Reference

G. J. Hahn and W. Q. Meeker (1991), Statistical Intervals. A Guide for Practitioners. J. Wiley & Sons.

Code

#
# Poisson confidence intervals (symmetric, two-sided).
# `k` may be a vector of observations.
#
ci <- function(k, alpha=0.05) {
  matrix(qchisq(c(alpha/2, 1-alpha/2), rbind(2*k, 2*k+2))/2, 2)
}
#
# Simulation study of coverage.
# Takes a few seconds with n=4e5.
#
n <- 4e5
lambda <- 10^seq(-1, 3, length.out=21)
set.seed(17)
coverage <- sapply(lambda, function(lambda) {
  mean((function(x) x[1,] <= lambda & lambda <= x[2,])(ci(rpois(n, lambda))))
})
#
# Calculation of coverage.
#
lambda.calc <- 10^seq(-1, 3, length.out=4021)
x <- max(lambda.calc)
CI <- ci(k <- 0:(x + 8*sqrt(x)))
coverage.calc <- sapply(lambda.calc, function(l) {
  covers <- CI[1,] <= l & l <= CI[2,]
  sum(dpois(k, l)[covers])
})
#
# Plot of results.
#
library(ggplot2)
ggplot(data.frame(lambda=lambda, Coverage=coverage), 
       aes(lambda, Coverage)) + 
  geom_line(data=data.frame(lambda=lambda.calc, Coverage=coverage.calc), col="#a0a0a0") + 
  geom_point(color="Red") + 
  scale_x_log10() + 
  coord_cartesian(ylim=c(min(0.9499, min(coverage.calc)), 1), expand=FALSE) + 
  geom_hline(yintercept=0.95) + 
  xlab(expression(lambda)) + 
  ggtitle("Simulated Coverage Rates of 95% Two-Sided Poisson Confidence Intervals")

Best Answer

Related Solutions

Solved – Is a random sample of a Poisson distribution also Poisson distributed

Solved – Confidence interval for mean of Poisson with only zero counts

Reference

Code

Related Question