R – How to Understand Poisson Test

poisson distributionpoisson processr

I want to understand the poisson.test() function:

poisson.test(x, T = 1, r = 1,
             alternative = c("two.sided", "less", "greater"),
             conf.level = 0.95)

I don't understand the parameters T, r and also what should be my alternative? I want to solve the following example and test the hypothesis that these data are Poisson.

A scientist counts the number of bacteria on a Petri dish. He knows that
for a standard Petri dish N, the number of bacteria, follows a Poission
distribution with parameter lambda=6.1. One dish is treated with a new bacteriacide and 2 bacteria are observed to survive.

Please can you help me understand the poisson.test with this example? What value of T I should use? What is the rate here? Which alternative shall I use?

Best Answer

This is an R function that implements a hypothesis test for differences in means. It is analogous to the ?t.test function, except where that assumes the data are normally distributed (in the population), this assumes the data are counts from a Poisson. The basic idea is that you have two counts from two different conditions, where you know the distributions are Poisson. From there, you can test if the two counts differ by more than you would expect by chance alone. You only need one count per condition (perhaps surprisingly) because the Poisson distribution specifies the variance quite rigidly. If the population distributions aren't perfectly Poisson, the test will not be valid, so you are making a very large assumption that you can't assess. Nonetheless, the test may be useful on occasion.

With this basic framework in mind, we can interpret the arguments. x is a vector of two counts. If the counts arise from situations in which one condition has a greater opportunity for the event to occur, you can account for that via the T argument, which serves as an offset (cf., Should I use an offset for my Poisson GLM?). Imagine you compared the counts of bacteria from two Petri dishes, where one was twice the size of the other. The former might yield larger counts without anything going on (I don't know if it actually works this way). In that case, you would want to tell the test that the dishes differed; that's what T does. In addition, you could test against some value of the rate ratio other than unity, which is what r does. You can also do a one-sample test against a specified value of the rate; r allows for that, too. Lastly, you could test that your intervention differs from the control, which would be a 'two-sided test', or that the intervention yields larger counts ('greater than'), or smaller counts ('less than').

In your specific example, you say you want to "test the hypothesis that these data are Poisson". That would be a goodness of fit test, which isn't what poisson.test() does. It also doesn't match what the question is asking for.

The question, as stated, is asking if it's reasonable to get a value of 2 from a Poisson distribution with mean 6.1. That would be a one-sample version of the test implemented.

Related Solutions

Solved – Writing a Monte Carlo simulation in R

Here is what I would do, in a two-steps answer to make things clearer. I suppose you want to compute the annual risk of getting sick (at least once). I propose a simple bootstraping procedure.

First, without resampling

Using your formula $r = 1- e^{-a d}$, you can compute the risk of disease $r_i$ for each of the 1000 pieces of chicken tested. You can estimate the risk $p$ of disease when eating one piece of chicken as the mean of the $r_i$’s. Here is a piece of code for that:

d<-c( rep(0,1980), c(1.158469, 2.01743,  1.896469, 1.055511, 1.263673, 1.616196, 
 1.197719, 0.913197, 1.108193, 2.058633, 0.904878, 1.241663, 1.525408, 1.730925, 
 1.143274, 1.200265, 1.103152, 1.465076, 1.838127, 1.162226) )

a <- 0.00005

R <- 1-exp(-a*d)
p <- mean(R);

The result is $p = 6.9 \cdot 10^{-7}$. If you estimate that the average person eats $104$ pieces of chicken a year, her/his probability of disease in a year is $1-(1-p)^{104} \simeq 104 p = 7.17 \cdot 10^{-5}$.

Now, let’s resample

First, the risk estimation is dependent of your sample of 1000 pieces of chicken. Let’s resample it:

d1 <- sample(d,1000,replace=TRUE)
R1 <- 1-exp(-a*d1)
p1 <- mean(R1);

Then, model the number of chicken pieces the guy eats in a year by a Poisson $\mathcal P(104)$.

N <- rpois(1,104)

The probability of getting sick in a year is then

p2 <- 1-(1-p1)**N

Just put all that in a loop of length 100000 and record the values, you’ll get a distribution of $p_1$ and $p_2$. You can plot a histogram:

histograms

You could also fit a Beta distribution on these...

Best Answer

Related Solutions

Solved – Writing a Monte Carlo simulation in R

Related Question