Solved – what statistical test should i use for the count data

count-datarspatial

i want to test the number of varroa mites in individual apiaries is higher in the south of england vs the north
I'm working with count data (mites) and catergorical data (location) i used a bionominal glm however the output produces a EXTREMELY high over dispersion (around 1600) is there any way to handle this? or should i be using a different statistical test?
used this code glm(formula = Varroa ~ location, family = poisson) this is the subsequent output the two locations are south and north Coefficients:

(Intercept) locationsouth 7.73165 0.09428 Degrees of Freedom: 1999 Total (i.e. Null); 1998 Residual Null Deviance: 3387000 Residual Deviance: 3380000 AIC: 3394000 –

Best Answer

There are two sorts of issues here: The treatment of your independent variable and the model chosen.

For your IV (location) as people noted in the comments, using it as a categorical variable is not usually great. If you only have two locations, then it's fine, but if you have more locations, you will want to look into other ways to treat it. These might be based on why you think there are different numbers in the north and south. E.g. if you think it is due to hours of daylight, then you could use latitude; if you think it is due to temperature, you could use average temperature in a location and so on. Even if you don't have a particular reason, you will want to do something other than a purely categorical variable. One choice is to use latitude and longitude.

Then there is your model. If you have a count dependent variable, you want a count regression. The usual starting place is Poisson regression, but overdispersion is very common (I've never had a data set that didn't have overdispersion). The usual solution there is a negative binomial regression. There are also zero-inflated versions of these models, if you have a lot of sites with no mites.

Related Question