How do you write the AIC and BIC of a regression model in terms of the coefficient-of-determination

aicbicmaximum likelihoodr-squaredregression

This question is to give a general exposition of the relationship between goodness-of-fit statistics in regression analysis, to answer questions like this one.

Consider a nonlinear Gaussian regression model of the form:

$$y_i = f(\mathbf{x}_i, \boldsymbol{\beta}) + \varepsilon_i
\quad \quad \quad \quad \quad
\varepsilon_i \sim \text{IID N}(0, \sigma^2).$$

There are a number of ways that the goodness-of-fit statistics for this model can be written in terms of each other. In particular, with the Gaussian form it is well-known that the OLS estimator is equivalent to the MLE for the model. In view of this, it should be possible to write the maximised log-likelihood in terms of the goodness-of-fit statistics, and therefore write the AIC and BIC in terms of the goodness-of-fit statistics.

Question: How do you write the AIC and BIC of a regression model in terms of the coefficient-of-determination?

Best Answer

In Gaussian regression, the MLE for the coefficient vector is equivalent to the OLS estimator, and the MLE for the error variance has its usual form relating to the residual sum-of-squares in the regression. That is, it can be shown that the MLE is related to the goodness-of-fit statistics by:

$$SS_\text{Res} = \sum_{i=1}^n (y_i-f(\mathbf{x}_i, \hat{\boldsymbol{\beta}}_\text{MLE}))^2 \quad \quad \quad \quad \quad \frac{SS_\text{Res}}{n} = \hat{\sigma}_\text{MLE}^2.$$

Consequently, we can write the maximised log-likelihood for the model as:

$$\begin{align} \hat{\ell}_{\mathbf{y}, \mathbf{x}} &\equiv \max_{\boldsymbol{\beta}, \sigma} \ell_{\mathbf{y}, \mathbf{x}} (\boldsymbol{\beta}, \sigma^2) \\[12pt] &= \ell_{\mathbf{y}, \mathbf{x}} (\hat{\boldsymbol{\beta}}_\text{MLE}, \hat{\sigma}_\text{MLE}^2) \\[12pt] &= - \frac{n}{2} \Big[ \ln (2 \pi) + \ln (\hat{\sigma}_\text{MLE}^2) \Big] - \frac{1}{2 \hat{\sigma}_\text{MLE}^2} \sum_{i=1}^n (y_i-f(\mathbf{x}_i, \hat{\boldsymbol{\beta}}_\text{MLE}))^2 \\[8pt] &= - \frac{n}{2} \Big[ \ln (2 \pi) + \ln (\hat{\sigma}_\text{MLE}^2) \Big] - \frac{SS_\text{Res}}{2 \hat{\sigma}_\text{MLE}^2} \\[8pt] &= - \frac{n}{2} \Big[ \ln (2 \pi) + \ln (\hat{\sigma}_\text{MLE}^2) \Big] - \frac{n}{2} \\[8pt] &= - \frac{n}{2} \Big[ \ln (2 \pi) + \ln (SS_\text{Res}) - \ln(n) + 1 \Big]. \\[8pt] \end{align}$$

As can be seen from this equation, the maximised log-likelihood is fully determined by the residual sum-of-squares and the number of data points. It can also be written in terms of other goodness-of-fit quantities if preferred. In particular, taking $s_Y^2 = SS_\text{Tot}/df_\text{Tot}$ to be the sample variance of the response variable, we can write the residual sum-of-squares in terms of the coefficient-of-determination as $SS_\text{Res} = (1-R^2) df_\text{Tot} s_Y^2$. Consequently, we also have the alternative form:

$$\hat{\ell}_{\mathbf{y}, \mathbf{x}} = - \frac{n}{2} \Bigg[ 1+\ln (2 \pi) + \ln \bigg( \frac{df_\text{Tot}}{n} \bigg) + \ln (1-R^2) + \ln (s_Y^2) \Bigg].$$

This latter form shows that the maximised log-likelihood is fully determined by the coefficient-of-determination, the sample variance of the response variable, and the number of data points. Once you have an expression for the maximised log-likelihood, it becomes trivial to get corresponding expressions for the AIC and BIC, to wit:

$$\begin{align} \text{AIC} &= n \Bigg[ 1 + \ln (2 \pi) + \frac{2k}{n} + \ln \bigg( \frac{df_\text{Tot}}{n} \bigg) + \ln (1-R^2) + \ln (s_Y^2) \Bigg], \\[12pt] \text{BIC} &= n \Bigg[ 1+\ln (2 \pi) + \frac{k\ln(n)}{n} + \ln \bigg( \frac{df_\text{Tot}}{n} \bigg) + \ln (1-R^2) + \ln (s_Y^2) \Bigg]. \\[6pt] \end{align}$$


Asymptotic analysis: If we were to hold the number of model terms $k$ constant and take $n \rightarrow \infty$ we have the asymptotic equivalence:

$$\text{AIC} \ \sim \text{BIC} \sim n \Bigg[ 1 + \ln (2 \pi) + \ln (1-R^2) + \ln (s_Y^2) \Bigg].$$

Under broad convergence conditions we also have $1-R^2 \rightarrow \sigma^2/\sigma_Y^2$ and $s_Y^2 \rightarrow \sigma_Y^2$, which gives the asymptotic equivalence:

$$\text{AIC} \ \sim \text{BIC} \sim n \Bigg[ 1 + \ln (2 \pi) + \ln (\sigma^2) \Bigg].$$