Solved – Correct way to compare two (very) different regression models

multiple regressionpiecewise linearregression

I'm working with some piecewise linear regression models, and I'd like to compare their predictions with those produced by multiple (weighted) linear regression models. Both models describe the same physical system, but have very different parameterizations of the independent variable. The two different parameterizations are such that the averages of the independent variables are much different–that is, call $x_1$ the independent variables under the first parameterization, and $x_2$ the ind. vars. under the second parameterization. Generally, I have $\mathbb{E}x_1 \gg \mathbb{E}x_2$. This (in turn) means that the model coefficients may be very different.

Additionally, it is sometimes the case that the piecewise linear regression may have a segment with slope = 0 and intercept = 0, which would seem to cause a problem for a statistic like CVRMSE.

The best way I can think to compare these two models is to just use a training and a test set, but then, I'm not sure what statistic I should compute to (algorithmically) say "this one is better". Is there a better way to discriminate between these two models in an a priori way?

Best Answer

Since the two models are not nested (that is, the independent variables of one model are not a subset of independent variables of the other model), one cannot use a maximum likelihood test. However, you can consider the AIC (Akaike Information Criterion) or one of its variants. If you have likelihoods of your models, let's call them $\mathcal{L}_1$ and $\mathcal{L}_2$, you can easily calculate the AIC's of your models with

$AIC_{1} = -2 \log(\mathcal{L}_1) + 2\cdot K_1$

where $K_1$ is the number of estimable parameters in the first model. Now, a single AIC value is not informative, or, rather, it is only informative relative to alternative models. Therefore often, when you have several models, one calculates the differences of the AICs of models relative to the AIC of the model with the smallest AIC:

$\Delta_i = AIC_i - AIC_{min} $

Now, you will not have a statistical test to compare these values. This is the information-theoretic approach, which is a different thing from the Neyman-Pearson hypothesis testing framework, and the two should not be mixed (Anderson 2001). However, there are some rules of thumb as to what is the magnitute of a delta that one considers "significant" (but in the common meaning of the word, and not as in "statistically significant"). In "Model selection and multimodel inference", Burnham and Anderson present the following table:

Delta_i     Level of empirical support of model i
0-2          Substantial
4-7          Considerably less
> 10         Essentially none

That is, if the difference of AIC of your two models is 4-7, you can assume that one of the model is "considerably" better supported by evidence than the other one. In fact, the authors state that

It seems best not to associate the words significant or rejected with results under an information-theoretic paradigm. Questions concerning the strength of evidence for the models in the set are best addressed using the evidence ratio as well as an analysis of residuals, adjusted $R^2$ and other model diagnostics or descriptive statistics.

The variants of AIC include $AIC_c$ (or c-AIC) which is suitable for small sample sizes, and QAIC (for overdispersed count data).

There are alternatives, of course, which allow you actually to do hypothesis testing. See for example this question.

Related Question