Solved – Comparing multiple incidence rates

epidemiologyincidence-rate-ratiopoisson distributionr

I asked the following question on stack overflow yesterday:
Negative binomial function in R

Reading comment 2, I understand that I cannot use a negative binomial modeling approach (a Poisson model works, but I suspect the assumption of equal mean and variance is invalid-though I'm uncertain as to how I can test this with an offset) and compare the betas to a reference category. I've googled and looked through my books but cannot find any other approach to compare multiple incidence rates.

b <- data.frame(
  s=c(1800,539,490,301),
  pop=c(2900000,1327000,880000,268000),
  reg=c("A","B","C","D")
)

summary(pois.b<-glm(s~reg,offset=log(pop),data=b,family="poisson"))

So the question is : Is there any difference between the regions with regard to incidence?

Since the question yesterday was software related and today is more statistically flavored I figured it belonged here on cross-validated.

EDIT: Aug 11:

Since there are no other covariates here and the numbers are large I guess something as simple as

pairwise.prop.test(x=b$s,n=b$pop,p.adjust.method="bonferroni")

would get me a long way.

Best Answer

If you only have the four data points, I think the best way to do this is with a G^2 test. You want to start by assuming the frequency is a binomial distribution (every person in the population has the condition with probability p). And your null hypothesis is that p_1=p_2=p_3=p_4.

So the overall mean is (1800+539+490+301)/(2.9m+1.327m+.88m+.268m)=0.000582.

Your expected cases in each group are 1688.7, 772.7, 512.4, and 156.1. You can calculate the G^2 statistic, but the answer I get is 192.8, which is chi-squared(3) under the null hypothesis. This is a very low p-value, so you'd reject the null and say that yes, you can be quite confident that the incidence is different between these locations.

In particular that last location is considerably higher than the other three, so that is contributing heavily to the low p-value. You can repeat this analysis for the other three and you may get something a bit different, but that is an exercise to the reader :-)

HTH

ETA: the DF is 3, not 1, as Yves pointed out in the comments.

Related Question