Solved – Assumptions of poisson regression and log-transformation of response

generalized linear modellinear modellognormal distributionpoisson-regression

I am trying to understand the given explanation for the below graph. It tries to make a comparison between poission regression with similar representation of a linear model.

enter image description here

1) When fitting a linear model with log transformed response we assume that $Var[log(Y)|X]$ is constant. How it can imply that the $Var(Y|X)$ is also constant?

Note: When fitting the linear model, they have added 0.1 to avoid the issue of zeros in the log transformation. When they are back-transforming 0.1 has been removed.

Best Answer

When fitting a linear model with log transformed response we assume that Var[log(Y)|X] is constant. How it can imply that the Var(Y|X) is also constant?

It's not true. It is not the case that $\text{Var}(\log(Y)|X=x)$ being constant implies $\text{Var}(Y|X=x)$ is constant:

Plot showing constant conditional variance in the logs is not constant variance on the original scale

-- in fact that's only the case if the mean is constant.

This problem appears to be caused by the original omitting to show the fact that there's conditioning on $x$ and then forgetting that it had done so.


Note, however, that the assumption of constant variance on the log scale is untrue. If you generate data from a Poisson regression model and take logs, the conditional variance is not constant. (and this is possibly what the text was trying to explain)

plot of y vs x from Poisson regression and plot of log(y+.1) vs x

Taking logs makes it close to linear over most of the range but the variance is definitely not constant in either of these two plots.

Incidentally if you want to add a constant when taking logs, something a little above 0.4 generally works very well, but I usually just say 0.5; it's easier to remember and close enough.