Solved – Count data and heteroscedasticity

count-dataheteroscedasticity

Why are count data characterized by heteroscedasticity? If this a violation of the main linear models' assumptions of homoscedasticity, does it mean that in the relevent models for count data heteroscedasticity is less important for detection?

Best Answer

Q1 "why [do] count data tend to be heteroscedastic"?

If we want to model counts as random, then the Poisson distribution, which is heteroscedastic, provides a natural characterisation of what 'random counts' might usefully mean. Hence one way to ask why count data is heteroscedastic is to ask why count data might be Poisson distributed. For this there are various derivations e.g. the 'Law of Rare Events' discussed in the link.

Poisson is not the only characterisation of 'random counts' that is possible, of which more below.

Q2 "is heteroscedasticity...something that [I] should be concerned about in [a] [P]oisson model if [I'm] using [dependent] variable that is consider to be count data?"

If you are running a regression that assumes that your dependent variable is Poisson distributed with a mean that depends on some covariates, e.g. a Generalised Linear Model, then you are already taking into account the heteroscedasticity due to being Poisson. However...

Overdispersion

This kind of model assumes that once the covariates have determined the expected mean then the remaining variation in your data is Poisson. But if you have missed out some important variables (which most of us do, most of the time) then the true mean might still be different for different values of those unseen variables, even if the variables that are in the model are the same. This is referred to as overdispersion and is a distinct variance-related issue you will want to think about. (Actually this is only one of several mechanisms that generates overdispersion, but it's enough for now).

The solution is to model the extra variation explicitly: Negative Binomial regression models are one class of models that do that.