Solved – AIC values and their use in stepwise model selection for a simple linear regression

aicmodel selectionregressionstepwise regression

The Wikipedia article for AIC says the following (emphasis added):

As an example, suppose that there were three models in the candidate set, with AIC values 100, 102, and 110. Then the second model is exp((100−102)/2) = 0.368 times as probable as the first model to minimize the information loss, and the third model is exp((100−110)/2) = 0.007 times as probable as the first model to minimize the information loss.

In this example, we would omit the third model from further consideration. We then have three options: (1) we could decide to gather more data, in the hope that this will allow clearly distinguishing between the first two models; (2) we could simply conclude that the data is insufficient to support selecting one model from among the first two; (3) we could take a weighted average of the first two models, with weights 1 and 0.368, respectively, and then do statistical inference based on the weighted multimodel.

However, a video discussing the stepwise method for model selection in R removes the smallest AIC value . It may be that I am grossly misunderstanding something in between how AIC works and how AIC is applied. Could anyone explain why we would not want to select the largest value in the video as was done in the Wikipedia example?

Best Answer

I would not use the guidance in that video. It is extremely poor advice to use automatic model selection strategies. To help understand this point, it may help you to read my answer here: Algorithms for automatic model selection

That having been said, the answer to your specific question is that you are misunderstanding what is being shown in the video. What the R output displayed in the video means is that the AIC listed on the far right is what the model would have if you dropped the variable in question. Lower AIC values are still better, both in the Wikipedia article and in the video. In the middle of the video, the presenter walks through reading the output and shows that dropping C2004 would lead to a new model with AIC = 16.269. This is the lowest AIC possible, so it is the best model, so the variable you should drop is C2004. The presenter is not saying that you should drop that model, but that you should drop C2004 from the current model to get that model. The second model, under step can be seen on the same screen. You can see that model does not include the variable C2004 and has AIC=16.27. (Again, for the record, using the AIC in this way is invalid, I'm just explaining what the video is recommending.)

enter image description here