Solved – OLS vs. Poisson GLM with identity link

generalized linear modellink-functionpoisson distribution

My question reveals my poor understanding of Poisson regression and GLMs in general. Here's some fake data to illustrate my question:

### some fake data
x=c(1:14)
y=c(0,  1,  2,  3,  1,  4,  9, 18, 23, 31, 20, 25, 37, 45)

Some custom functions to return psuedo-R2:

### functions of pseudo-R2

psuR2 <- function(null.dev, model.dev) { 1 - (model.dev / 
                        null.dev)}

predR2 <- function(actuals, predicted) { 1 - (sum((actuals - 
                predicted)^2)) / sum((actuals - 
                mean(actuals))^2)}

Fit four models: OLS, Gaussian GLM with identity link, Poisson GLM with log link, Poisson GLM with identity link

#### OLS MODEL
mdl.ols = lm(y ~ x)
summary(mdl.ols)
pred.ols = predict(mdl.ols)

summary(mdl.ols)$r.squared
predR2(y, pred.ols)

#### GLM MODEL, family=gaussian(link="identity")
mdl.gauss <- glm(y~x, family=gaussian(link="identity"), 
                  maxit=500)
summary(mdl.gauss)
pred.gauss = predict(mdl.gauss)

psuR2(mdl.gauss$null.deviance, mdl.gauss$deviance)
predR2(y, pred.gauss)

#### GLM MODEL, family=possion (canonical link)
mdl.poi_log <- glm(y~x, family=poisson(link="log"), 
                        maxit=500)
summary(mdl.poi_log)
pred.poi_log= exp(predict(mdl.poi_log))  #transform

psuR2(mdl.poi_log$null.deviance, mdl.poi_log$deviance)
predR2(y, pred.poi_log)

#### GLM MODEL, family=poisson((link="identity")
mdl.poi_id <- glm(y~x, family=poisson(link="identity"), 
                  start=c(0.5,0.5), maxit=500)
summary(mdl.poi_id)
pred.poi_id = predict(mdl.poi_id)

psuR2(mdl.poi_id$null.deviance, mdl.poi_id$deviance)
predR2(y, pred.poi_id)

Finally plot the predictions:

#### Plot the Fit
plot(x, y) 
lines(x, pred.ols, lwd=2, col="green")
lines(x, pred.gauss, col="black", lty="dotted", lwd=1.5)
lines(x, pred.poi_log, col="red")
lines(x, pred.poi_id, col="blue")

legend("topleft", bty="n", title="Model:",
    legend=c("pred.ols", "pred.gauss", "pred.poi_log", 
             "pred.poi_id"),
    lty=c("solid", "dotted", "solid", "solid"),
    col=c("green", "black", "red", "blue"),
    lwd=c(2,1.5,1,1)
    )

Plot of predictions for the four different models

I have 2 questions:

  1. It appears that the coefficients and predictions coming out of OLS and Gaussian GLM with identity link are exactly the same. Is this always true?

  2. I'm very surprised that the OLS estimates and predictions are very different from the Poisson GLM with identity link. I thought both methods would try to estimate E(Y|X). What does the likelihood function look like when I use the identity link for Poisson?

Best Answer

  1. Yes, they're the same thing. MLE for a Gaussian is least squares, so when you do a Gaussian GLM with identity link, you're doing OLS.

  2. a) "I thought both methods would try to estimate E(Y|X)"

    Indeed, they do, but the way that conditional expectation is estimated as a function of the data is not the same. Even if we ignore the distribution (and hence how the data enter the likelihood) and think about the GLM just in terms of mean and variance (as if it were just a weighted regression), the variance of a Poisson increases with the mean, so the relative weights on observations would be different.

    b) "What does the likelihood function look like when I use the identity link for Poisson?"

    $\mathcal{L}(\beta_0,\beta_1) = \prod_i e^{-\lambda_i}\lambda_i^{y_i}/y_i!$

    $\qquad\qquad\,=\exp(\sum_i -\lambda_i+{y_i}\log(\lambda_i)-\log{(y_i!)}\,)\quad$ where $\lambda_i=\beta_0+\beta_1 x_i$

    $\qquad\qquad\,=\exp(\sum_i -(\beta_0+\beta_1 x_i)+{y_i}\log(\beta_0+\beta_1 x_i)-\log{(y_i!)}\,)$