Model Selection – Why F-test is Not Possible for Comparing Non-Nested Models

model selectionmultiple regression

I repeatedly read that one cannot use an F-test to compare two models that are not nested. Usually I have to nested models
$$
m: y=X\beta+Z\gamma+\epsilon \text{, and } m_r:y=X\beta+u
$$
The test-statistic usually looks something like
$$
F(y)=\frac{RSS_r-RSS}{RSS}\cdot\frac{L}{N-K-L}
$$
where $rRSS$ and $RSS$ is the residual sum of squares of the restricted and ful model respectively. $N,K,L$ are the number of samples, variables in X and variables in Z.

However, I don't see why I couldn't use the residual sum of squares of any two models $m$ and $m'$ and compute
$$
F(y)=\frac{RSS_{m'}-RSS_m}{RSS_m}\cdot\frac{variables(m)-variables(m')}{N-variables(m)},
$$
assuming $variables(m)\geq variables(m')$ w.l.o.g.

Would be great to hear your thoughts on this! Thanks in advance

Best Answer

With nested models, it's possible to conceive of a saturated model. This provides a theoretical upper bound to the likelihood and parameter space. When you have this, you know for a fact that the long-range behavior of the log likelihood ratio is a $\chi^2_{p-q}$ random variable. This is necessary for conducting formal inference. The $F$ statistic is the actual distribution of the exact test for normally distributed variables, but the $F$ converges to a $\chi^2$ distribution as the denominator degrees of freedom goes to infinity.

With non-nested models, it's possible to have arbitrary high and low likelihoods. Then there are no guarantees about the long range behavior of the statistic. This means some scenarios give you very high false negative, very high false positive probabilities, or both with no way of calibrating the test.

You can qualitatively compare non-nested models using the AIC or BIC. But you cannot make formal inference on their relative impact, you just have to say, "Model Y had a higher/lower IC than Model X".

Related Question