Chapter 11 of Introduction to Data Science is about Poisson distributions.
One example sample has 58638 observations out of 100000 with a value less than or equal to 10.
> sum(rpois(100000, lambda=10)<=10)
[1] 58638
I think I can read that as "a 0.58638 probability that values observed from this distribution are 10 or less".
I'm comfortable with the fact that this is close to the theoretical probability, but not exactly the same.
The book says R can show us how much variation there is around these probabilities using the R function poisson.test
.
> poisson.test(58638, 100000)
Exact Poisson test
data: 58638 time base: 1e+05
number of events = 58638, time base = 1e+05, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
0.5816434 0.5911456
sample estimates:
event rate
0.58638
The book's explanation confuses me. I've highlighted the part I don't get.
For 95% of the samples that we could generate using rpois(), using a sample size of 100,000, and a desired mean of 10, we will get a result that lies between 0.5816434 and 0.5911456
It's confusing because I didn't tell the function that the desired mean is 10. How does it know? What if my desired mean is 3, like here?
> sum(rpois(100000, lambda=3)<=10)
[1] 99970
When I set lambda (the mean) to 3, clearly there are a lot more observation with a value less than 10. How is the test relevant here?
Best Answer
The $\lambda$-value of the Poisson distribution equals the expected value and the variance. See the definition on Wikipedia or in any good textbook. Thus, by passing
lambda=10
torpois
you get Poisson distributed values with mean $\approx$ 10 and var $\approx$ 10:The theoretical property for a value to be smaller than or equal to 10 can be calculated from the cumulative distribution function:
Which is approximately what you get from the sample:
Now let's provide the theoretical value to
poisson.test
:Now let's simulate:
I believe the confidence interval given by the test tells you the following:
The test doesn't know that $\lambda=10$. It estimates the distribution parameter from the sample. However, we know the real distribution.
You also need to consider that the confidence interval reported by the test is also an estimate.
poisson.test
is based onbinom.test
, where you can read:Note that the confidence interval calculated from the simulation is narrower than the interval reported by the test.