Solved – glm poisson regression – regressors selection

feature selectionmodel selectionnegative-binomial-distributionpoisson distributionstepwise regression

I have fit a GLM poisson regression model. Then i detected overdispersion, which was the reason that I have decided to fit a Negative Binomial model:

  # oversidpersion check
  dispersiontest(poismodel, trafo = 0)
  dispersiontest(poismodel.t, trafo = 0)
  # there is evidence of overdispersion (c is estimated to be 18.13873) which speaks quite strongly 
  # against the assumption of equidispersion (i.e. c=0). 

  summary(ngbinomial)
  ngbinomial.t <-glm.nb(TOT.N ~ OPEN.L + sqrt(MONT.S) + sqrt(POLIC) + D.PARK + sqrt(SHRUB) +
                           sqrt(WAT.RES) + L.WAT.C + sqrt(L.P.ROAD) + D.WAT.COUR)
  summary(ngbinomial.t)
  # conclusion : negative binomial is great improvement over the poisson model (based on AIC)

I have then run a (manual) stepwise selection procedure using add1 and drop1, as well as automatic selection procedure stepAIC.

Is it ok to run the stepwise selection procedure on the negative binomial model? does this procedure need to be adjusted?

(The corresponding output for my selection procedure – just the last step is included):

BACKWARD SELECTION:

> nbmodel <- glm.nb(TOT.N ~ OPEN.L + sqrt(MONT.S) + sqrt(POLIC) + D.PARK + 
+                       sqrt(SHRUB) + L.WAT.C + sqrt(L.P.ROAD) + D.WAT.COUR)
> back <- stepAIC(nbmodel, direction = "backward")
Start:  AIC=386.94
TOT.N ~ OPEN.L + sqrt(MONT.S) + sqrt(POLIC) + D.PARK + sqrt(SHRUB) + 
    L.WAT.C + sqrt(L.P.ROAD) + D.WAT.COUR
Step:  AIC=381.54
TOT.N ~ OPEN.L + D.PARK + L.WAT.C + sqrt(L.P.ROAD)

                 Df    AIC
<none>              381.54
- sqrt(L.P.ROAD)  1 382.25
- L.WAT.C         1 383.42
- OPEN.L          1 390.82
- D.PARK          1 443.91

BACKWARD + FORWARD SELECTION:

> both <- stepAIC(nbmodel, direction = "both")
Start:  AIC=386.94
TOT.N ~ OPEN.L + sqrt(MONT.S) + sqrt(POLIC) + D.PARK + sqrt(SHRUB) + 
    L.WAT.C + sqrt(L.P.ROAD) + D.WAT.COUR
Step:  AIC=381.54
TOT.N ~ OPEN.L + D.PARK + L.WAT.C + sqrt(L.P.ROAD)

                 Df    AIC
<none>              381.54
+ sqrt(MONT.S)    1 382.25
- sqrt(L.P.ROAD)  1 382.25
+ sqrt(POLIC)     1 383.04
+ D.WAT.COUR      1 383.38
- L.WAT.C         1 383.42
+ sqrt(SHRUB)     1 383.50
- OPEN.L          1 390.82
- D.PARK          1 443.91

Or would you advice me to first run the selection procedure on the poisson model and then fit a negative binomial model on the already reduced poisson model ?

Best Answer

You should not do stepwise regression or any similar approaches whether based on p-values or AIC or anything else (most certainly not without adjusting the final inferences for it), see e.g. Algorithms for automatic model selection. Whether you are looking at a Poisson or a negatie binomial model does not affect this answer, at all.

Approaches like LASSO/elastic net (with cross-validation) or Bayesian shrinkage priors (e.g. the horseshoe, easily available e.g. in rstanarm) are generally preferrable.