Why are count data characterized by heteroscedasticity? If this a violation of the main linear models' assumptions of homoscedasticity, does it mean that in the relevent models for count data heteroscedasticity is less important for detection?
Solved – Count data and heteroscedasticity
count-dataheteroscedasticity
Related Solutions
The option vce(robust)
for regress, xtreg etc... produces heteroskedasticity consistent standard errors. Note also that heteroskedasticity robust standard errors in a regression with fixed effects is produced in Stata by clustering on the panel's grouping variable.
Theoretically, heteroskedasticity robust standard errors will produce consistent estimates in the presence of heteroskedasticity (as your sample size goes to infinity), but of course, all bets are off for too small a sample.
I don't come from econometrics, and I have no idea about this Moulton factor. But, I can answer your last question using tools from my native field. You can adjust for both heteroskedasticity and correlated errors if you formulate an appropriate mixed-effects model. For example, consider
$$Y_{ij} = X_{ij}\beta_i + b_j + \epsilon_{ij}$$
where $j$ indexes group of observations, $i$ indexes observations within a group, $Y_{ij}$ is the response, $X_{ij}$ a vector of covariates, $\beta_i$ is the target of inference, $\epsilon_{ij}$ are independent errors, and $b_j$ is a random effect that induces correlations within groups. It seems like you want something like $\epsilon_{ij} \sim N(0, \sigma^2_j)$, with the error variance depending on the group index, and $b_j \sim N(0, \sigma^2_g)$. The trick is to actually fit this model.
One way is to rephrase the heteroskedastic errors as yet more mixed effects terms, leaving behind homoskedastic errors. These can be dealt with using a mixed-modeling tool such as the R package nlme
. Let $\tau^2_j = \sigma^2_j - \min_j\{\sigma^2_j\}$ and $\min_j\{\sigma^2_j\} = \sigma^2_\delta$. The following model is equivalent to the one above:
$$Y_{ij} = X_{ij}\beta_i + b_j + c_{ij} + \delta_{ij}$$
where $Z_j$ is an identity matrix, $c_{ij}$ is normal with variance $\tau^2_j$, and $\delta_{ij}$ is iid noise with variance $\sigma^2_\delta$. You may notice that for some $j$, $c_{ij}$ has variance zero. When you go to fit the model, you will probably need to leave out that term.
Here is some R code that does this for a simple two-group example (using the REML criterion for fitting).
require(ggplot2)
require(nlme)
group = cars$speed==4 | cars$speed >=24
y1 = cars$speed[group]; n1 = length(y1)
y2 = cars$speed[!group]; n2 = length(y2)
# This next line tells me to omit c_ij for group 2.
# With equal sigma_g across groups, smallest within-group variance implies smallest error variance.
# If I were fitting more than just a mean to the data, I'd have to try
# all the groups or something unpleasant like that.
var(y1) > var(y2)
y = c(y1, y2); group = c(rep(1, n1), rep(2, n2))
# To force-feed this into nlme, I encode all the random effects as a matrix
# to be multiplied by IID normal variates. Note the zeroes for c_2j.
z1 = diag(rep(1, n1))
z1 = cbind(z1, 1)
zzero = matrix(0, ncol = n1 + 1, nrow = n2)
z = cbind(rbind(z1, zzero))
z = cbind(z, c(rep(0, n1), rep(1, n2)))
colnames(z) = c(paste0("c", 1:7), "b1", "b2")
ggplot(reshape2::melt(z)) + geom_tile(aes(y = -Var1, x = Var2, fill = value))
const = c(rep(1, n1 + n2))
mydata = data.frame(y, group, z, const)
# The pdClass="pdIdent" line gives iid normal variates that get postmultiplied
# into the columns of my matrix Z specified by the formulas in the list above it.
# The + 0 gets rid of an intercept term.
mod = nlme::lme(data = mydata,
fixed = y ~ 1,
method = "REML",
random = reStruct(list( ~c1 + c2 + c3 + c4 + c5 + c6 + c7+ 0 | const,
~b1 + b2 + 0 |const),
pdClass="pdIdent"))
summary(mod)
Best Answer
Q1 "why [do] count data tend to be heteroscedastic"?
If we want to model counts as random, then the Poisson distribution, which is heteroscedastic, provides a natural characterisation of what 'random counts' might usefully mean. Hence one way to ask why count data is heteroscedastic is to ask why count data might be Poisson distributed. For this there are various derivations e.g. the 'Law of Rare Events' discussed in the link.
Poisson is not the only characterisation of 'random counts' that is possible, of which more below.
Q2 "is heteroscedasticity...something that [I] should be concerned about in [a] [P]oisson model if [I'm] using [dependent] variable that is consider to be count data?"
If you are running a regression that assumes that your dependent variable is Poisson distributed with a mean that depends on some covariates, e.g. a Generalised Linear Model, then you are already taking into account the heteroscedasticity due to being Poisson. However...
Overdispersion
This kind of model assumes that once the covariates have determined the expected mean then the remaining variation in your data is Poisson. But if you have missed out some important variables (which most of us do, most of the time) then the true mean might still be different for different values of those unseen variables, even if the variables that are in the model are the same. This is referred to as overdispersion and is a distinct variance-related issue you will want to think about. (Actually this is only one of several mechanisms that generates overdispersion, but it's enough for now).
The solution is to model the extra variation explicitly: Negative Binomial regression models are one class of models that do that.