Solved – How to apply Akaike Information Criterion and calculate it for Linear Regression

aicfeature selectionmodel selectionregression

I tried to develop a Linear Regression model and want to understand how to measure models with a different combination of variables with Akaike's Information Criterion.

Consider the Linear Regression below:

$ChildenScore = \alpha + \beta_1FatherEducation + \beta_2MotherEducation + \beta_3FatherIncome + \beta_4MotherIncome$

I'd like to try AIC to choose the best model.
Hope to hear some explanations.

Best Answer

A simple formula for the calculation of the AIC in the OLS framework (since you say linear regression) can be found in Gordon (2015, p. 201):

$$\text{AIC} = n *\ln\Big(\frac{SSE}{n}\Big)+2k $$

Where SSE means Sum of Squared Errors ($\sum(Y_i-\hat Y_i)^2$), $n$ is the sample size, and $k$ is the number of predictors in the model plus one for the intercept. Although AIC values are not generally interpretable, differences between values for different models can be interpreted (A number of questions on CV covers this issue, for example in here). So, the model with the smallest AIC is usually selected. It is easy to see why this is the case in the above formula: All else being equal, as the SSE decreases, AIC also decreases.

In other sources, you may find a more general, maximum-likelihood formula. For example, in Applied Regression Analysis and Generalized Linear Models, Fox provides:

$$\text{AIC}_j \equiv - \text{log}_eL(\hat \theta_j)+2s_j$$

where $L(\hat \theta_j)$ is the maximized likelihood under model $M_j$ ($\theta_j$ is the vector of parameters of the model, $\hat \theta_j$ is the vector of maximum-likelihood estimates of the parameters) and $s_j$ is the number of parameters (Fox 2016, p. 673-674). In OLS framework, this second formula simplifies to the first one. Using this first formula, I think it is not difficult to calculate and use AIC for the comparison of linear regression models.


Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models (3rd ed.). Los Angeles: Sage Publications.

Gordon, R. A. (2015). Regression Analysis for the Social Sciences. New York and London: Routledge.

And the original article:

Akaike, H. (1998). Information Theory and an Extension of the Maximum Likelihood Principle. In E. Parzen, K. Tanabe, & G. Kitagawa (Eds.), Selected Papers of Hirotogu Akaike (pp. 199–215). New York: Springer.