Log-linear and GLM (Poisson) regression

generalized linear modellog-linearpoisson-regressionregression

I am afraid I am asking a stupid question… but…

I would like to study the spending (my outcome variable) of a company by department, number of staff, activity, etc. I have collected my data and when I plot the spending, it looks very skewed:

enter image description here

So, I thought it was a good candidate for log-transformation:

enter image description here

which looks more normal.

Then, I ran a linear regression:

lm(log(accountingAmount) ~ Pred1 + Pred2 + Pred3, data=df)

I was thinking of running a GLM Poisson regression but my outcome is not really count (well… I guess it could be considered as count since it is dollars) and its variance is far from being equal to the mean, which does not meet the criteria for Poisson distribution.

I have read different posts (log-linear vs Poisson, Is log-linear a GLM or Poisson regression vs log-linear model), but I could not really find my answer.

Questions

So, is my first approach (using lm(log(accountingAmount) ~ Pred1 + Pred2 + Pred3, data=df)) a good approach? Basically, it is the "standard way" to do log-linear regression?

Is that correct that GLM Poisson regression is not possible in this case (because of very high variance compared to the mean) ?

Best Answer

The term "log-linear" isn't uniquely defined. Even Wikipedia doesn't seem to come to internal agreement. Its entry on log-linear analysis has to do with modeling counts in contingency tables, while its log-linear model entry describes your approach to modeling. I try to avoid that terminology and just say what's being modeled. In your case, it's ordinary linear regression with a log-transformed outcome.

There's nothing wrong with log transformation of a continuous, strictly positive outcome. If the residuals from your resulting linear regression model are well behaved, it's probably the simplest way to go. A drawback is that you are modeling mean values on the log scale, which isn't how people typically think about means.

You are correct that a Poisson GLM isn't appropriate for continuous non-count data, but other types of GLM can use a log link and might work for your data. This page suggests other GLM approaches, like Gaussian (possibly inverse) or gamma with log links, which might work better and more readily give predictions on an untransformed mean scale.

Related Question