Solved – Selecting the best GLM (generalized linear model)

generalized linear modelrregression

R's glm function for generalized linear models is a logistic regression when the response is dichotomous(yes/no, male/female, etc..) and the family parameter is passed the argument binomial. I'm wondering how to judge if the model we built is good eough? As we know, in OLS regression some criterion like R^2 and adjusted R^2 can tell us how much variations are explained but not for GLM. See example I performed:

    > summary(fit.full)
    Call:
    glm(formula = ynaffair ~ gender + age + yearsmarried + children + 
    +religiousness + education + occupation + rating, family = binomial(), 
    data = Affairs)

    Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
    -1.6575  -0.7459  -0.5714  -0.2552   2.5099  

    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
    (Intercept)    0.71792    0.96165   0.747 0.455336    
    gendermale     0.28665    0.23973   1.196 0.231811    
    age           -0.04494    0.01831  -2.454 0.014142 *  
    yearsmarried   0.09686    0.03236   2.993 0.002758 ** 
    childrenyes    0.37088    0.29466   1.259 0.208147    
    religiousness -0.32230    0.09003  -3.580 0.000344 ***
    education      0.01795    0.05088   0.353 0.724329    
    occupation     0.03210    0.07194   0.446 0.655444    
    rating2       -0.02312    0.58177  -0.040 0.968303    
    rating3       -0.84532    0.57619  -1.467 0.142354    
    rating4       -1.13916    0.55740  -2.044 0.040981 *  
    rating5       -1.61050    0.56649  -2.843 0.004470 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 675.38  on 600  degrees of freedom
    Residual deviance: 608.22  on 589  degrees of freedom
    AIC: 632.22

After removed the insignificant variables, the reduced model look like below,although the AIC decreasd, we still do not know if this is the model with the lowest AIC we can achieved:

    > summary(fit.reduced)
    Call:
    glm(formula = ynaffair ~ age + yearsmarried + religiousness + 
        +rating, family = binomial(), data = Affairs)

    Deviance Residuals: 
    Min        1Q      Median      3Q      Max  
   -1.5117  -0.7541  -0.5722  -0.2592   2.4123  

    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
    (Intercept)    1.10220    0.71849   1.534 0.125014    
    age           -0.03588    0.01740  -2.062 0.039224 *  
    yearsmarried   0.10113    0.02933   3.448 0.000565 ***
    religiousness -0.32571    0.08971  -3.631 0.000282 ***
    rating2        0.11848    0.57258   0.207 0.836068    
    rating3       -0.70168    0.56671  -1.238 0.215658    
    rating4       -0.96190    0.54230  -1.774 0.076109 .  
    rating5       -1.49502    0.55550  -2.691 0.007118 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 675.38  on 600  degrees of freedom
    Residual deviance: 613.63  on 593  degrees of freedom
    AIC: 629.63

And we perform the ANOVA, suggesting that the reduced model with
four predictors fits as well as the full model:

    > anova(fit.reduced, fit.full, test="Chisq")
    Analysis of Deviance Table

    Model 1: ynaffair ~ age + yearsmarried + religiousness + +rating
    Model 2: ynaffair ~ gender + age + yearsmarried + children + 
             +religiousness + education + occupation + rating
    Resid. Df Resid. Dev Df Deviance Pr(>Chi)
     1       593     613.63                     
     2       589     608.22  4   5.4124   0.2475

Best Answer

If you want to find the best model for your data, one way to go could be using the function dropterm()from package MASS. It automatically test all models that differ from the current model by the dropping of one single term. This is done respecting marginality, so it doesn't try models in which one main effect is dopped if the same predictor is also present in any interaction (I think there is no good reason to fit such models anyway).

You could start with a model where all terms (main effects and interactions) are present, and do a backward simplification: when you run the function dropterm you can ask the function to compare all possible reduced model with a likelihood ratio test or also to order them according to the AIC; then you can update your model removing superfluous predictors. You can repeat these step several times, until there are no more predictors that can be removed without causing a significant drop in the goodness of fit of the model (according to either the AIC or the likelihood ratio test), indicating that you have found the best GLM model for your data.