Solved – Model selection: can I compare the AIC from models of count data between linear and poisson models

aicgeneralized linear modelglmmmodel selection

I am modeling count data (with offset / exposure parameter). My modeling strategy is use of a Poisson model and a negative binomial regression model. I compare model AICs, which are about -760 for my case. I selected the model with the best AIC. Finally, I was interested how a linear model (GLM with Gaussian distribution and identity link) would perform in comparisson to the count models. The linear model obviously does not fulfil the distributional assumptions as well as the count models. However, now I find model AICs of arround -150 suggesting much better fit.

Should I use the linear model based on AIC despite the fact that basic distributional assumptions are not met? Are the AICs comparable?

I used a GLMM function for fitting: http://glmmadmb.r-forge.r-project.org/ (my data are clustered).

Best Answer

You cannot use likelihood-based statistics like AIC to compare across models with different likelihood functions - the underlying formulas are different. In linear regression, the likelihood function is the normal density function, in Poisson regression it is the Poisson function. That will account for the differences in the AIC probably more than any differences in fit.

Before you decide to even use a linear model, you need to make sure that the residuals from the model are normally distributed (you can proxy that by looking at the distribution of the outcome variable, though keep in mind it isn't the same). If they are not normally distributed, or close enough for the eye, then you can't use a normal regression model to do any hypothesis testing.

Assuming that it is approximately normal, I would take a two broad approaches to choose the model to report.

1) Predicted outcomes. Estimate the predicted outcomes of each model and compare. Does the linear model have better predictive ability? You may want to do this in a cross-validation framework, where you "train" your model on part of your data and use the rest for prediction.

2) Intuitive interpretation of coefficients. Poisson coefficients can be complicated to understand - they are not the change in number of y but rather a proportional change. Depending on your context this may be more or less useful. Sometimes it is worth sacrificing fit if your model can be more easily interpreted by the end-user - for example, some researchers are willing to avoid the complexity of logit and probit models for the easier-to-interpret coefficients in a linear probability model, even though the LPM has tons of setbacks. Think about who your audience is, what is your context, what is your research question, etc., as you make these decisions.

EDIT: I forgot to add this paper, which gives a good comparison across a range of different count models and may be helpful.

Related Solutions

Solved – Overdispersion and modeling alternatives in Poisson random effect models with offsets

There is a maximum possible number of counted answers, related to the number of questions asked. Although one can model this as a Poisson process of the counting type, another interpretation is that a Poisson process has no theoretical limit for the number of counted answers, that is, it is on $[0,\infty)$. Another distribution, i.e., a discrete one that has finite support, e.g., the beta binomial, might be more appropriate as it has a more mutable shape. However, that is just a guess, and, in practice, I would search for an answer to a more general question using brute force...

Rather than check for overdispersion, which has no guarantee of leading to a useful answer, and, although one can examine indices of dispersion to quantify dispersion, I would more usefully suggest searching for a best distribution using a discrete distribution option of a fit quality search program, e.g., Mathematica's FindDistribution routine. That type of a search does a fairly exhaustive job of guessing what known distribution(s) work(s) best not only to mitigate overdispersion, but also to more usefully model many of other data characteristics, e.g., goodness of fit as measured a dozen different ways.

To further examine my candidate distributions, I would post hoc examine residuals to check for homoscedasticity, and/or distribution type, and also consider whether the candidate distributions can be reconciled as corresponding to a physical explanation of the data. The danger of this procedure is identifying a distribution that is inconsistent with best modelling of an expanded data set. The danger of not doing a post hoc procedure is to a priori assign an arbitrarily chosen distribution without proper testing (garbage in-garbage out). The superiority of the post hoc approach is that it limits the errors of fitting, and that is also its weakness, i.e., it may understate the modelling errors through pure chance as many distributions fits are attempted. That then, is the reason for examining residuals and considering physicality. The top down or a priori approach offers no such post hoc check on reasonableness. That is, the only method of comparing the physicality of modelling with different distributions, is to post hoc compare them. Thus arises the nature of physical theory, we test a hypothetical explanation of data with many experiments before we accept them as exhausting alternative explanations.

Solved – R: hurdle models with date as non-nested random effect

This is my understanding of truncated models. It's a two step approach. First, a model is used to model the value output for all values larger than 0, this is your truncated model (looks good to me). This model is not 'checking the 0's' but leaves the 0's out of the model. This is why it assumes a truncated negative binomial distribution, truncated in that responses must be larger than 0 so it cuts of the 0 part of the data.

The second model is a binomial model used to model the probability that the response value is not zero (checking the zero's) which is complementary to the first as it models the probability that a response belongs to the part you cut of the data (the 0's) or to the part that's in the first model (the >0's). The response of this model should not be value but as.numeric(data2$value>0) which returns a 1 if value is bigger than 0 and a 0 otherwise.

From the glmmADMB package "In contrast to zero-inflated models, hurdle models treat zero-count and non-zero outcomes as two completely separate categories, rather than treating the zero-count outcomes as a mixture of structural and sampling zeros."

So the models answer two distinct but related categories

1) Can we predict if the outcome is bigger than 0? (binomial, response is nz <- as.numeric(data2$value>0))

2) If it is bigger than 0, can we predict the value? (truncated negative binomial, response is data2$value[which(data2$value>0)])

Hope this helps. Your binomial model should be

hurP1 <- glmmadmb(nz ~ product * sex * morph + (1 | Date), 
                  data = data2, family = "binomial")

P.S.: The models do not have to use identical predictor variables, the process that determines 0 vs >0 can be different from the process that determines value given the value is larger than 0.

Best Answer

Related Solutions

Solved – Overdispersion and modeling alternatives in Poisson random effect models with offsets

Solved – R: hurdle models with date as non-nested random effect

Related Question