Analysis – Why Log-Linear Analysis Ignores Poisson Regression Equidispersion Assumption

assumptionscount-datadispersionlog-linearpoisson-regression

As far as I understand it, log-linear analysis is based on the use of a Poisson regression. This is what I understood from various online resources, like this online tutorial or this text whose intro says: "In this chapter we study the application of Poisson regression models to the analysis of contingency tables. This is perhaps one of the most popular
applications of log-linear models […]
".

The wikipedia article on log-linear analysis lists three assumptions for log-linear analysis:

  1. The observations are independent and random;
  2. Observed frequencies are normally distributed about expected frequencies over repeated samples […]
  3. The logarithm of the expected value of the response variable is a linear combination of the explanatory variables. […]

However, unless I misread it, it does not mention the assumption that the mean should equal the variance (aka equidispersion), as usually assumed when using a Poisson regression:

Mean=Variance By definition, the mean of a Poisson random variable must be equal to its variance.

What am I missing?

Can I simply ignore the equidispersion assumption when using a Poisson regression for a log-linear analysis? Isn't there a risk, for example, that coefficients detected as significant will actually be non-significant -or vice versa?

Or is it implied that I should use an alternative when the equidispersion assumption is not met, like negative binomial or generalized Poisson regressions?

Thanks,

Best Answer

Quoting from Section 4.3.3 of the second edition of Agresti's "Categorical Data Analysis":

Overdispersion is common in the modeling of counts. When the model for the mean is correct but the true distribution is not Poisson, the ML estimates of model parameters are still consistent but standard errors are incorrect.

He continues to describe both negative-binomial and quasi-likelihood approaches to deal with overdispersion. So yes, for these models it (should be) implied to proceed in a way that takes into account the relationship between fitted values and variance.

The omission of this issue in introductory explanations of count modeling isn't really different from starting with the assumption of homoscedasticity and a normal error distribution in linear regression. You start from the simple, then build from there.