Solved – How to interpret output of poisson.test

poisson distributionr

Chapter 11 of Introduction to Data Science is about Poisson distributions.

One example sample has 58638 observations out of 100000 with a value less than or equal to 10.

> sum(rpois(100000, lambda=10)<=10)
[1] 58638

I think I can read that as "a 0.58638 probability that values observed from this distribution are 10 or less".

I'm comfortable with the fact that this is close to the theoretical probability, but not exactly the same.

The book says R can show us how much variation there is around these probabilities using the R function poisson.test.

> poisson.test(58638, 100000)

    Exact Poisson test

data:  58638 time base: 1e+05
number of events = 58638, time base = 1e+05, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.5816434 0.5911456
sample estimates:
event rate 
   0.58638 

The book's explanation confuses me. I've highlighted the part I don't get.

For 95% of the samples that we could generate using rpois(), using a sample size of 100,000, and a desired mean of 10, we will get a result that lies between 0.5816434 and 0.5911456

It's confusing because I didn't tell the function that the desired mean is 10. How does it know? What if my desired mean is 3, like here?

> sum(rpois(100000, lambda=3)<=10)
[1] 99970

When I set lambda (the mean) to 3, clearly there are a lot more observation with a value less than 10. How is the test relevant here?

Best Answer

The $\lambda$-value of the Poisson distribution equals the expected value and the variance. See the definition on Wikipedia or in any good textbook. Thus, by passing lambda=10 to rpois you get Poisson distributed values with mean $\approx$ 10 and var $\approx$ 10:

set.seed(42)
x <- rpois(100000, lambda=10)
mean(x)
#[1] 10.00424
var(x)
#[1] 10.1363

The theoretical property for a value to be smaller than or equal to 10 can be calculated from the cumulative distribution function:

ppois(10, 10)
#[1] 0.5830398

Which is approximately what you get from the sample:

sum(x<=10)/1e5
#[1] 0.58321

Now let's provide the theoretical value to poisson.test:

poisson.test(round(ppois(10, 10)*1e5), 100000)

    Exact Poisson test

data:  round(ppois(10, 10) * 1e+05) time base: 1e+05
number of events = 58304, time base = 1e+05, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.5783169 0.5877921
sample estimates:
event rate 
   0.58304 

Now let's simulate:

y <- rpois(1e8, lambda=10)
y <- matrix(y, ncol=1e5)
quantile(rowSums(y<=10)/1e5, c(0.025, 0.975))
#     2.5%     97.5% 
#0.5799880 0.5861802 

I believe the confidence interval given by the test tells you the following:

For 95% of the (same-sized) samples that we could generate from the same distribution that we drew the sample from, we will get a result that lies between ...

The test doesn't know that $\lambda=10$. It estimates the distribution parameter from the sample. However, we know the real distribution.

You also need to consider that the confidence interval reported by the test is also an estimate. poisson.test is based on binom.test, where you can read:

Confidence intervals are obtained by a procedure first given in Clopper and Pearson (1934). This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals.

Note that the confidence interval calculated from the simulation is narrower than the interval reported by the test.

Related Question