Solved – Comparing models with different number of predictors

model comparisonmultiple regressionr-squaredstatistical significance

Given that the overall F-test of a multiple regression model has an F distribution, which depends on the number of predictors in the model, I understand why you cannot compare the F-statistics from models with different number of predictors.

However, the p-value of the F-statistics always has a uniform distribution between 0 and 1, and represents the probability that at least one βj ≠ 0 under the null hypothesis.

Can I compare the F-statistics p-value from models with different number of predictors? If not, what are good alternatives? (R-squared?; adjusted R-squared?)

Best Answer

I would NOT recommend the $R^2$ as this measure increases as the number of variables increases. In other words, the $R^2$ does not account for overfitting.

Among the options you mentioned the adjusted $R^2$ would be the best. If you take a look at the formula:

$R^2_{adj} = 1 - \frac{(1-R^2)\cdot(n-1)}{n-p-1}$

Since the number of parameters $p$ is in the denominator of the formula, the addition of variables that do not increase significantly the $R^2$ will penalize the $R^2_{adj}$.

A better approach to compare your models would be to use the Akaike Information Criterion:

$AIC_i = -2\cdot log(\mathcal{L}_i) + 2\cdot p_i$

where $\mathcal{L}_i$ is the likelihood of model $i$

You could obtain this very easy in R by using the AIC function: AIC(model1, model2)

Related Question