Poisson Regression – Closest Approximation of Poisson GLM Using Weighted Least Squares Analysis

heteroscedasticitypoisson-regressionrweighted-regression

I have an application where I would like to approximate a Poisson GLM with identity link, i.e. a glm of the form

fit = glm(y~x1+x2+x3+..., family=poisson(link=identity), data=data)

using a (noniterative) weighted least squares model fit instead, ie using

fit = lm(y~x1+x2+x3+...,, weights=..., data=data)

which comes down to using a weighted nonnegative least squars model (nnls) where both y and the covariate matrix are multiplied by sqrt(weights).

I was wondering what would be the correct/best weights to use though to take into account the mean/variance relationship?

For Poisson the variance=mean, so would that imply that I should use weights=1/variance, i.e. weights = 1/(y+epsilon) with some small epsilon (e.g. 1), given that y should be a good estimator of the mean & of the variance? I would like to use weighted OLS instead of a GLM mainly for computational reasons (a lm.fit typically being 5-10x faster than a glm.fit) and also because the identity link Poisson GLM in R doesn't seem to be implemented in a very robust way (it often complains about it not finding valid starting values; probably this is also related to it not allowing for nonnegativity constraints on the fitted coefficients, so that predictions can go negative in some cases).

When working with a single covariate sqrt transforming x and y to stabilize the variance and doing a regular OLS fit on those also seems to give a nearly identical fit as the Poisson GLM. But this only works with a single covariate though.

I was wondering if both of these procedures are generally accepted approximations of Poisson GLMs? And how they would typically be referred to, and what a good citation would be for both recipes?

To enforce nonnegativity coefficients on the fitted coefficients I would then also like to use nnls or glmnet with lower.limits=0 (as glmnet does not support identity link Poisson; h20 does but does not support nonnegativity constraints; nnlm with Kullback-Leibler divergence & the nnpois function from the addreg package are other options and do have nonnegativity constraints, but both are much slower than glmnet or nnls). So would using 1/variance weights within a regular (nonnegative) weighted least squares framework be an OK thing to do to approach a Poisson error structure?

In this small example the expected slope of 1 is in fact estimated more accurately by a weighted OLS regression with 1/variance weights than by a Poisson regression, which surprised me – what would be the intuitive reason for this ? :

x=1:100
n=length(x)
set.seed(1)
y=rpois(n=n, lambda=x)
data=data.frame(x=x,y=y)
glmfit = glm(y~0+x, family=poisson(link=identity), data=data) # Poisson GLM
coef(glmfit) # 1.022574
olsfit = lm(y~0+x, data=data) # regular unweighted OLS fit
coef(olsfit) # 1.024758
sqrlmfit = lm(sqrt(y)~0+sqrt(x), data=data) # OLS with x & y sqrt transformed to stabilize variance
coef(sqrlmfit) # 1.009303
data$weights = 1/(y+1) # approx 1/(y+eps)=1/variance for Poisson with eps here = 1
wlmfit = lm(y ~ 0+x, weights=weights, data=data) # weighted OLS with 1/variance weights
coef(wlmfit) # 1.008069 - closest to expected simulated slope of 1!

Also relevant perhaps is this link.

Best Answer

Just figured I was being stupid - with identity link the iteratively reweighted LS algo to fit GLMs just ends up regressing yhat on X in each iteration (cf irls algo http://bwlewis.github.io/GLM/, with identity link z=y and W=1/expected variance) and as initialization for yhat R uses in the glm.fit source code y+eps with eps=0.1. So what I was doing with my weighed least squares was in fact just a single iteration of the IRLS algo to fit GLMs [which boils down to a Fisher scoring variant of the Newton-Raphson algorithm], which explains why it was working so well for me. It also answers my problem - what I was doing then was in fact the closest noniterative (single step) approximation of a Poisson identity link GLM using weighted least squares... For nonnegativity constrained regression the same recipe applies and all I have to do is change lm.wfit by nnls (and iterate this if I would like to approach the true maximum likelihood objective more accurately). I posted a worked out example here.

Square root transforming I cannot do btw as that would break the additivity of my predictors, which I know a priori should hold in my application.

Related Question