Solved – Confidence interval for poisson distributed data

confidence intervalpoisson distributionstandard deviation

I'd like to calculate the confidence interval for a variable with a lower bound and I can't seem to figure out how to do so. I've seen several similar posts, but none answered my question, at least not directly. I use a toy example here for simplicity.

To be concrete lets consider the distribution (I chose these numbers myself so it kind of looks like a Poisson distribution),

{1,2,3,5,1,2,2,3,7,2,3,4,1,5,7,6,4,1,2,2,3,9,2,1,2,2,3}

which is shown below,

enter image description here

If I calculate the standard deviation and mean for this distribution I get 3.1 and 2.4 respectively, but since the distribution isn't a normal distribution this doesn't really give me the confidence interval. How would I go about calculating the confidence interval properly?

Best Answer

This answer is based on the clarification offered in comments:

I'd like to make a statement such as ... "I am 68% sure the mean is between $3.1−σ_−$ and $3.1+σ_+$", and I want to calculate $σ_+$ and $σ_−$.

I think that at least in the physics world this is called a confidence interval.

Let's take it as given that in response to my question "confidence interval for what?" you responded that want a confidence interval for the mean (and as made clear, not some other interval for values from the distribution).

There's one issue to clear up first - "I am 68% sure the mean is between" isn't really the usual interpretation placed on a confidence interval. Rather, it's that if you repeated the procedure that generated the interval many times, 68% of such intervals would contain the parameter.

Now to address the confidence interval for the mean.

I agree with your calculation of mean and sd of the data:

> x=c(1,2,3,5,1,2,2,3,7,2,3,4,1,5,7,6,4,1,2,2,3,9,2,1,2,2,3)
> mean(x);sd(x)
[1] 3.148148
[1] 2.106833

However, the mean doesn't have the same sd as the population the data was drawn from.

The standard error of the mean is $\sigma/\sqrt{n}$. We could estimate that from the sample sd (though if the data were truly Poisson, this isn't the most efficient method):

> sd(x)/sqrt(length(x))
[1] 0.4054603

If we assumed that the sample mean was approximately normally distributed (but did not take advantage of the possible Poisson assumption for the original data), and assumed that $\sigma=s$ (invoking Slutsky, in effect) then an approximate 68% interval for the mean would be $3.15\pm 0.41$.

However, the sample isn't really large enough for Slutsky. A better interval would take account of the uncertainty in $\hat \sigma$, which is to say, a 68% t$_{26}$-interval for the mean would be

$3.15\pm 1.013843\times 0.41$

which is just a fraction wider.

Now, as for whether the sample size is large enough to apply the normal theory CI we just used, that depends on your criteria. Simulations at similar Poisson means (in particular, ones deliberately chosen to be somewhat smaller than the observed one) at this sample size suggest that using a t-interval will work quite well for similar Poisson rates and 27 observations or more.

If we take account of the fact that the data are (supposedly) Poisson, we can get a more efficient estimate of the standard deviation and an interval for $\mu$, but if there's any risk the Poisson assumption could be wrong - a chance of overdispersion caused by some homogeneity of Poisson parameters, say - then the t-interval would probably be better.

Nevertheless, we should consider that specific question - "how to get a confidence interval for the population mean of Poisson variables" -- but this more specific question has already been answered here on CV - for example, see the fine answers here.