Solved – How does a Poisson distribution work when modeling continuous data and does it result in information loss

biostatisticsmixed modelpoisson distributionquasi-likelihood

A co-worker is analyzing some biological data for her dissertation with some nasty Heteroscedasticity (figure below). She's analyzing it with a mixed model but is still having trouble with the residuals.

Log-transforming the response variables cleans things up and based on feedback to this question this seems to be an appropriate approach. Originally, however, we had thought there were issues in using transformed variables with mixed models. It turns out that we had been misinterpreting a statement in Littell & Milliken's (2006) SAS for Mixed Models that was was pointing out why it is inappropriate to transform count data and then analyze it with a normal linear mixed model (full quote is below).

An approach that also improved the residuals was to use a generalized linear model with a Poisson distribution. I've read that the Poisson distribution can be used for modeling continuous data (eg, as discussed in this post), and stats packages allow it, but I don't understand what is going when the model is fit.

For the purpose of understanding how the underlying calculations are being made, my questions are: When you fit a Poisson distribution to continuous data, 1) does it the data get rounded to the nearest integer 2) does this result in the loss of information and 3) When, if ever, is it appropriate to use a Poisson model for continuous data?

Littel & Milliken 2006, pg 529
"transforming the [count] data may be counterproductive. For example, a transformation can distort the distribution of the random model effects or the linearity of the model. More importantly, transforming the data still leaves open the possibility of negative predicted counts. Consequently, inference from a mixed model using transformed data is highly suspect."

enter image description here

Best Answer

I've been estimating continuous positive outcome Poisson regressions with the Huber/White/Sandwich linearized estimator of variance fairly frequently. However, that's not a particularly good reason to do anything, so here are some actual references.

From the theory side, $y$ does not need to be an integer for for the estimator based on the Poisson likelihood function to be consistent. This is shown in Gourieroux, Monfort and Trognon (1984). This is called Poisson PMLE or QMLE, for Pseudo/Quasi Maximum Likelihood.

There's also some encouraging simulation evidence from Santos Silva and Tenreyro (2006), where the Poisson comes in best-in-show. It also does well in a simulation with lots of zeros in the outcome. You can also easily do your own simulation to convince yourself that this works in your snowflake case.

Finally, you can also use a GLM with a log link function and Poisson family. This yields identical results and placates the count-data-only knee jerk reactions.

References Without Ungated Links:

Gourieroux, C., A. Monfort and A. Trognon (1984). “Pseudo Maximum Likelihood Methods: Applications to Poisson Models,” Econometrica, 52, 701-720.