Solved – a difference between a Low AIC and a Bigger AIC

aic

I inputted my data sets on r and it spit 2 AIC, one with interactions and one without it.
Without Interactions I got 682.4, and with interactions I got an AIC of 684, the difference is minimal, but I do want to understand what does that mean.

Best Answer

I recommend Burnham & Anderson's book Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. They explicitly discuss differences in AIC.

A difference of less than 2 is not a lot of evidence that the model with the lower AIC is truly a better description of the data. (Technically: that it has lower Kullback-Leibler difference from the true data-generating process.) In such a case, Burnham & Anderson recommend going with the simpler model.

In their parlance, AIC differences of 5-10 constitute certain evidence, and AIC differences larger than 10 strong evidence in favor of the model with the lower AIC.

Related Solutions

Model Fit Metrics – Difference Between AIC and C-Statistic (AUC)

AIC and c-statistic are trying to answer different questions. (Also some issues with c-statistic have been raised in recent years, but I'll come onto that as an aside)

Roughly speaking:

AIC is telling you how good your model fits for a specific mis-classification cost.
AUC is telling you how good your model would work, on average, across all mis-classification costs.

When you calculate the AIC you treat your logistic giving a prediction of say 0.9 to be a prediction of 1 (i.e. more likely 1 than 0), however it need not be. You could take your logistic score and say "anything above 0.95 is 1, everything below is 0". Why would you do this? Well this would ensure that you only predict one when you are really really confident. Your false positive rate will be really really low, but your false negative will skyrocket. In some situations this isn't a bad thing - if you are going to accuse someone of fraud, you probably want to be really really sure first. Also, if it is very expensive to follow up the positive results, then you don't want too many of them.

This is why it relates to costs. There is a cost when you classify a 1 as a 0 and a cost when you classify a 0 as a 1. Typically (assuming you used a default setup) the AIC for logistic regression refers to the special case when both mis-classifications are equally costly. That is, logistic regression gives you the best overall number of correct predictions, without any preference for positive or negative.

The ROC curve is used because this plots the true positive against the false positive in order to show how the classifier would perform if you used it under different cost requirements. The c-statistic comes about because any ROC curve that lies strictly above another is clearly a dominating classifier. It is therefore intuitive to measure the area under the curve as a measure of how good the classifier overall.

So basically, if you know your costs when fitting the model, use AIC (or similar). If you are just constructing a score, but not specifying the diagnostic threshold, then AUC approaches are needed (with the following caveat about AUC itself).

So what is wrong with c-statistic/AUC/Gini?

For many years AUC was the standard approach, and is still widely used, however there are a number of problems with it. One thing that made it particularly appealing was that it corresponds to a Wilcox test on the ranks of the classifications. That is it measured the probability that the score of a randomly picked member of one class will be higher than a randomly picked member of the other class. The problem is, that is almost never a useful metric.

The most critical problems with AUC were publicized by David Hand a few years back. (See references below) The crux of the problem is that while AUC does average over all costs, because the x-axis of the ROC curve is False Positive Rate, the weight that it assigns to the different cost regimes varies between classifiers. So if you calculate the AUC on two different logitic regressions it won't be measuring "the same thing" in both cases. This means it makes little sense to compare models based on AUC.

Hand proposed an alternative calculation using a fixed cost weighting, and called this the H-measure - there is a package in R called hmeasure that will perform this calculation, and I believe AUC for comparison.

Some references on the problems with AUC:

When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance? D.J. Hand, C. Anagnostopoulos Pattern Recognition Letters 34 (2013) 492–495

(I found this to be a particularly accessible and useful explanation)

Solved – Significant difference between AIC in generalized mixed models

Don't choose a model just because it has a better AIC or a better AICc or a better $R^2$ or any other better property if that model doesn't make any sense.

Model selection is an art. It requires a balance of statistical knowledge and substantive knowledge.

Statistics, more generally, is part of a reasoned argument for or against certain propositions. It ought to be designed to improve knowledge, in whatever field you are in and whether this involves exploration, modeling, or whatever.

Part of the point of learning a lot of statistics is to be able to answer interesting questions and make stronger arguments. It is not to let the computer do your thinking for you.

Best Answer

Related Solutions

Model Fit Metrics – Difference Between AIC and C-Statistic (AUC)

Solved – Significant difference between AIC in generalized mixed models

Related Question