Solved – How to interpret output of poisson.test

poisson distributionr

Chapter 11 of Introduction to Data Science is about Poisson distributions.

One example sample has 58638 observations out of 100000 with a value less than or equal to 10.

> sum(rpois(100000, lambda=10)<=10)
[1] 58638

I think I can read that as "a 0.58638 probability that values observed from this distribution are 10 or less".

I'm comfortable with the fact that this is close to the theoretical probability, but not exactly the same.

The book says R can show us how much variation there is around these probabilities using the R function poisson.test.

> poisson.test(58638, 100000)

    Exact Poisson test

data:  58638 time base: 1e+05
number of events = 58638, time base = 1e+05, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.5816434 0.5911456
sample estimates:
event rate 
   0.58638

The book's explanation confuses me. I've highlighted the part I don't get.

For 95% of the samples that we could generate using rpois(), using a sample size of 100,000, and a desired mean of 10, we will get a result that lies between 0.5816434 and 0.5911456

It's confusing because I didn't tell the function that the desired mean is 10. How does it know? What if my desired mean is 3, like here?

> sum(rpois(100000, lambda=3)<=10)
[1] 99970

When I set lambda (the mean) to 3, clearly there are a lot more observation with a value less than 10. How is the test relevant here?

Best Answer

The $\lambda$-value of the Poisson distribution equals the expected value and the variance. See the definition on Wikipedia or in any good textbook. Thus, by passing lambda=10 to rpois you get Poisson distributed values with mean $\approx$ 10 and var $\approx$ 10:

set.seed(42)
x <- rpois(100000, lambda=10)
mean(x)
#[1] 10.00424
var(x)
#[1] 10.1363

The theoretical property for a value to be smaller than or equal to 10 can be calculated from the cumulative distribution function:

ppois(10, 10)
#[1] 0.5830398

Which is approximately what you get from the sample:

sum(x<=10)/1e5
#[1] 0.58321

Now let's provide the theoretical value to poisson.test:

poisson.test(round(ppois(10, 10)*1e5), 100000)

    Exact Poisson test

data:  round(ppois(10, 10) * 1e+05) time base: 1e+05
number of events = 58304, time base = 1e+05, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.5783169 0.5877921
sample estimates:
event rate 
   0.58304

Now let's simulate:

y <- rpois(1e8, lambda=10)
y <- matrix(y, ncol=1e5)
quantile(rowSums(y<=10)/1e5, c(0.025, 0.975))
#     2.5%     97.5% 
#0.5799880 0.5861802

I believe the confidence interval given by the test tells you the following:

For 95% of the (same-sized) samples that we could generate from the same distribution that we drew the sample from, we will get a result that lies between ...

The test doesn't know that $\lambda=10$. It estimates the distribution parameter from the sample. However, we know the real distribution.

You also need to consider that the confidence interval reported by the test is also an estimate. poisson.test is based on binom.test, where you can read:

Confidence intervals are obtained by a procedure first given in Clopper and Pearson (1934). This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals.

Note that the confidence interval calculated from the simulation is narrower than the interval reported by the test.

Related Solutions

Solved – Comparing incidence rates

A couple thoughts:

First, your suggested comparison - the incident rate ratio between A and B - currently isn't conditioned on any covariates. Which means your number of events is 54 for Group A and 28 for Group B. That's more than enough to go with the usual large sample based Confidence Interval Methods.

Second, even if you are intending to adjust for the effect of age, rather than computing the ratio for each group, you might be better served by using a regression approach. Generally, if you're stratifying by many levels of a variable, it becomes rather cumbersome compared to a regression equation, which would give you the ratio of the rates of A and B while controlling for Age. I believe the standard approaches will still work for your sample size, though if you're worried about it, you could use something like glmperm.

Solved – Why is the Poisson distribution only defined for a positive rate

In general Poisson distribution is defined as having parameter $\lambda > 0$. For $\lambda < 1$ the smaller $\lambda$ gets, the more mass is accumulated around $0$, i.e. $\lim_{\lambda \rightarrow 0} P(K=0|\lambda) = 1$ and $\lim_{\lambda \rightarrow 0} P(K=k|\lambda) = 0$ for $k>0$ (Said, 1958). This is shown on an ugly plot below, that illustrates Poisson pmf for $\lambda = 1/10^0, ..., 1/10^{20}$ and $\lambda = 0$ as a red line (notice that it is heavily zoomed).

As noted by you and @ChristophHanck, number to the zeroth power is one and common convention is that $0^0 = 1$. On another hand $0^x = 0$ for non-zero $x$'s. Dividing zero by non-zero gives you zero. So if $\lambda = 0$ Poisson pmf simplifies to

$$ \frac{0^k \times \text{(whatever)}}{ \text{(whatever)} } $$

so it is degenerate distribution with all point mass at zero (cf. here):

$$ f(k) = \begin{cases} 1 & \text{if }k=0, \\[6pt] 0 & \text {if }k>0.\end{cases} $$

Mean and variance for Poisson distribution are equal to $\lambda$ and in this case only zero has non-zero probability, so expected value is obvious and there is no variability (variance is zero). Also R does not have any problem with $\lambda$'s defined as non-negative (rather than positive) values:

> dpois(0:10, 1e-10)
 [1]  1.000000e+00  1.000000e-10  5.000000e-21  1.666667e-31  4.166667e-42  8.333333e-53
 [7]  1.388889e-63  1.984127e-74  2.480159e-85  2.755732e-96  2.755732e-107
> dpois(0:10, 0)
 [1] 1 0 0 0 0 0 0 0 0 0 0
> dpois(0:10, -1)
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Warning message:
In dpois(0:10, -1) : NaNs produced

So the behavior of Poisson pmf at $\lambda=0$ is coherent with its behavior at limit, moreover there is no problem with calculating Poisson pmf form this value, but it yields a degenerate distribution.

Said, A.S. (1958). Some properties of the poisson distribution. AIChE Journal, 4(3), 290-292.

Best Answer

Related Solutions

Solved – Comparing incidence rates

Solved – Why is the Poisson distribution only defined for a positive rate

Related Question