I think that you might be confusing an extra-sum-of-squares F-test with a likelihood ratio test. Although, both are used to compare two models.

A likelihood ratio statistic, denoted by $\Lambda$, is given by

$$\Lambda = \frac{L\text{(reduced model})}{L(\text{full model})}$$

Taking $-2\log\Lambda$ produces a statistic that has $\chi^2_{d.f(\text{reduced model})-d.f(\text{full model})}$ distribution. That is to say that taking $-2\log$ of the $\Lambda$ gives you a $\chi^2$ distribution.

I have not used SAS so I cannot comment on the output, but I hope that I have been able to answer your question.

Note: that $\Lambda$ is equivalent to your L

Janne: For linear regression you could use either the likelihood ratio test or the extra-sum-squares F-test and you should end up with the same p-value. Despite, this they are not the same thing.

As has been mentioned above the likelihood ratio test produces a statistic that has $\chi^2_{d.f(\text{reduced model})-d.f(\text{full model})}$ distribution. Where as an extra-sum-of-squares F-test, given by

$$F = \frac{(SSR_{\text{reduced model}}-SSR_{\text{full model}})/d.f_{\text{reduced model}} - d.f_{\text{full model}}}{\hat{\sigma}^2_\text{full model}}$$

producing a statistic that has $F_{d.f(\text{reduced model})-d.f(\text{full model}),d.f(\text{full model})}$ distribution.
Where SSR is the sum of squared residuals and $\hat{\sigma}^2$ is our standard estimate.

Your reasoning is too pessimistic.

Given the $K$ additional features, the LR test statistic will follow an asymptotic $\chi^2$ distribution with $K$ degrees of freedom *if the null is true* (and other auxiliary assumptions, e.g., a suitable regression setting, weak dependence assumptions etc.), i.e., if the additional predictors in $B$ are just noise features that lead to "overfitting".

The figure below plots the 0.95%-quantiles of the $\chi^2_K$ distribution as a function of $K$, i.e. the value that the LR statistic needs to exceed to reject the null that $A$ is the "good" model.

As you can see, higher and higher values of the test statistic are needed the larger your set in $B$ that "overfits" the data. So the test suitably makes it more difficult for the (inevitable) better fit (or log-likelihood) of the larger model to be judged "sufficiently" large to reject model $A$.

Of course, for any given application of the test, you might get spurious overfitting that is so "good" that you still falsely reject the null. This "type-I" error is however inherent in any statistical test, and will occur in about 5% of the cases in which the null is true if (like in the figure) we use the 95%-quantiles of the test's null distribution as our critical values.

## Best Answer

No, there is no evidence that this is a good approach. Building models based on whether some significance test has $p \leq 0.05$ or $\leq 0.1$ associated with it is problematic. For a start, it invalidates naive post-selection inference (i.e. just using the "selected" model and treating coefficients and hypothesis test results from this model, as if it had been prespecified). It is also not a particularly good approach for building a good predictive model, either. The particular p-value threshold for this is kind of irrelevant for this question, but even the common usage of 0.05 is pretty arbitrary in the first place (and some have argued that often lower thresholds would be desirable).