Solved – AIC or p-value: which one to choose for model selection

aicmodel selectionstepwise regression

I'm brand new to this R thing but am unsure which model to select.

  1. I did a stepwise forward regression selecting each variable based on the lowest AIC. I came up with 3 models that I'm unsure which is the "best".

    Model 1: Var1 (p=0.03) AIC=14.978
    Model 2: Var1 (p=0.09) + Var2 (p=0.199) AIC = 12.543
    Model 3: Var1 (p=0.04) + Var2 (p=0.04) + Var3 (p=0.06) AIC= -17.09
    

    I'm inclined to go with Model #3 because it has the lowest AIC (I heard negative is ok) and the p-values are still rather low.

    I've ran 8 variables as predictors of Hatchling Mass and found that these three variables are the best predictors.

  2. My next forward stepwise I choose Model 2 because even though the AIC was slightly larger the p values were all smaller. Do you agree this is the best?

    Model 1: Var1 (p=0.321) + Var2 (p=0.162) + Var3 (p=0.163) + Var4 (p=0.222)  AIC = 25.63
    Model 2: Var1 (p=0.131) + Var2 (p=0.009) + Var3 (p=0.0056)                  AIC = 26.518
    Model 3: Var1 (p=0.258) + Var2 (p=0.0254)                                   AIC = 36.905
    

thanks!

Best Answer

AIC is a goodness of fit measure that favours smaller residual error in the model, but penalises for including further predictors and helps avoiding overfitting. In your second set of models model 1 (the one with the lowest AIC) may perform best when used for prediction outside your dataset. A possible explanation why adding Var4 to model 2 results in a lower AIC, but higher p values is that Var4 is somewhat correlated with Var1, 2 and 3. The interpretation of model 2 is thus easier.