Model Selection – Calculate AIC for Both Linear and Non-Linear Models

aicmodel selectionpredictive-models

I have data made of vectors $\textbf{x}$ and $\textbf{y}$. I want to predict $\textbf{y}$ with $\textbf{x}$ and a set of hyperparameters $a_{1, …, 3}$ to be fitted with a linear and a nonlinear model using a python package (curve_fit function from scipy.optimize):

Linear model (in x):
\begin{equation}\label{equation_linear_model}
\hat{\textbf{y}}_1 = \text{F}_1(\textbf{x},a_1) = \text{x} \cdot a_1
\end{equation}

Non-linear model (in x):
\begin{equation}\label{equation_non_linear_model}
\hat{\textbf{y}}_2 = \text{F}_2(\textbf{x},a_2,a_3) =
a_2 \cdot \Big(\frac{1}{e^{(-\text{x}\cdot a_3)}+1}\Big)
\end{equation}

$\hat{\textbf{y}}_1$ and $\hat{\textbf{y}}_2$ are the predictions of the respective models which have to be as close as $\textbf{y}$ as possible.
I want then to compare the performance of those two models by calculating the AIC. I used so far the definition of AIC for linear models as follows (for both models including the linear model).
$$
AIC = n \log(\hat{\sigma}^2) + 2k
$$

where
$$
\hat{\sigma}^2 = \frac{\sum \hat{\epsilon}_i^2}{n}
$$

where $n$ is the number of data points, $k$ is the number of parameters, $\epsilon$ is the error of the prediction. What would be the formula to compute the two AIC values (the one from the linear and the one from the non linear model) and a corresponding p-value for significant difference of the two AIC values

Best Answer

AIC formula

What would be the formula to compute the two AIC values (the one from the linear and the one from the non linear model)

TLDR; Assuming that you do least squares regression for both linear and non-linear models, your formula to compute AIC works for both.

Your formula is not for linear models but for models with Gaussian distributed responses whose parameters are estimated based on maximum likelihood.

Let the density for a measurement $y_i$ as function of $x_i$ and parameters $\beta$ be:

$$f(y_i;x_i,\beta) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{1}{2} \left( \frac{y_i-g(x_i,\beta)}{\sigma} \right)^2}$$

Where $g(x_i,\beta)$ can be a linear function, but also a non-linear function. Then the likelihood function is

$$\mathcal{L}(\beta ; y_i, x_i) = \prod_{i=1}^n f(y_i ; x_i, \beta) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{1}{2} \left( \frac{y_i-g(x_i,\beta)}{\sigma} \right)^2}$$

If, you fill in the maximum likelihood estimates for $\beta$ and $\sigma$ (where $\hat\beta$ is also know as the least squares estimate):

$$\hat\beta = \min_\beta:\sum_{i=1}^n\left( {y_i-g(x_i,\beta)}\right)^2$$

$$\hat\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n\left( {y_i-g(x_i,\hat\beta)}\right)^2}$$

, and if you take the logarithm, then you get

$$\begin{array}{} \log\mathcal{L}(\beta ; y_i, x_i) &= &\sum_{i=1}^{n} \log\left(\frac{1}{\sqrt{2\pi \hat\sigma^2}} e^{-\frac{1}{2} \left( \frac{y_i-g(x_i,\hat\beta)}{\hat\sigma} \right)^2}\right)\\ &= &-\frac{n}{2} \log(2\pi ) - n \log(\hat{\sigma}) -\frac{1}{2} \frac{\sum_{i=1}^n(y_i-g(x_i,\hat\beta))^2}{\hat\sigma^2}\\ &= &-\frac{n}{2} \log(2\pi ) - n \log(\hat{\sigma}) -\frac{1}{2} n \end{array}$$

And since log likelihood can be shifted with any constant you can get

$$\begin{array}{} \log\mathcal{L}(\beta ; y_i, x_i) = - n \log(\hat{\sigma}) \end{array}$$

And your formula for AIC is based on that.


So you can apply your AIC formula for both linear and non-linear relationships, $g(x_i,\beta)$, that describe the conditional mean in your model. The formula relates to the Gaussian distribution in the beginning of this post and does not specifically relate to linear models. You can have linear models with a different likelihood than $n\log \hat\sigma$ and non-linear models with a likelihood $n\log \hat\sigma$.

Sidenote: Be aware however when you mix AIC formula's based on different distributions. The shift in constant that can make likelihood functions different does not matter for relative comparisons (see Is the exact value of any likelihood meaningless?), but it becomes tricky when you make comparisons with different distributions.

p values

and a corresponding p-value for significant difference of the two AIC values

Comparing AIC is not typically about computing a p-value and testing a hypothesis. It is about making a selection among different models. See also: Why can't we use AIC and p-value variable selection within the same model building exercise?

If you still would like to test the hypothesis whether both models have the same AIC then you can approach it as a likelihood ratio test that will approach a chi-squared distribution when the models are nested. If the models are not nested then I do not know what sort of distribution they follow but possibly there are might be cases for this as well that can be handled.

Related Question