Solved – Large sample size and partial F-test for multiple regression makes adding a variable always significant

f-testmultiple regression

I am developing a two variable multiple regression model.
$$ Y = b0 – b1 * X1 + b2 * X2 $$

I am using the following formula for partial F-test from under the section Testing Incremental R2. The F-statistics calculated is supposed to tell me if adding the second variable is significant (more details in that link).

$$ F= {\frac{(R_L^2 – R_S^2)/(k_L-k_s)}{(1-R_L^2)/(N-k_L-1)}}$$

My first variable has a strong correlation:
regression_coeff_string: b1 = 0.664, b0 = 0.035
R2_val: 0.564

My second variable has a weak correlation:
regression_coeff_string: b1 = -25.026, b0 = 0.469,
R2_val: 0.027

Adding my seond variable only marginally improves the R2 value
regression_coeff_string: b0 = 0.0559, b1 = 0.6633, b2 = -5.2222,
R2_val: 0.565

However, because I have a sample size 2949, that
With $$ R_L^2 = 0.565, R_S^2 = 0.564$$
$k_L$ the number of predictors in the full set being 2,
$k_S$ the number of predictors in the subset being 1
$$ F= {\frac{(0.565 – 0.564)/(2-1)}{(1-0.565)/(2949-2-1)}} = 6.77$$

With F(1,2946) at 0.05 confidence having a F_stat of 4.182, the result is significant. But it seems that it is only because the sample size is large. If I sort the second variable X2 in ascending order in Excel and leave the order of the Y and X1 variables unchanged, I would still get a significant F score.

Question: How can I do a fair incremental R2 test for the addition of a new variable in multiple regression when the sample size becomes large?

Simply looking at the R2 of each X variable individually does not take into account that that they may be cross-correlated, that is why I turned to the incremental R2 test to see how the overall R2 improves relative to adding a new variable.


The context of my example is predicting solar radiation. The first variable is a solar radiation variable from NWP (numerical weather prediction) software (ie. high correlation). The other variables are other NWP output variables and we are trying to improve our prediction.

Best Answer

The test you are doing is "fair", it's just that p-values don't answer the question you want to ask (they often don't). The way to proceed is to figure out what change in effect size is substantively meaningful and base decisions on that.

This is entirely dependent on your field and, indeed, on your question. To illustrate: If 1 in 1000 children misunderstand a question on a test, that is a very small proportion, and won't affect the validity of the test much. But if 1 in 1000 airplane trips end in a crash, that is a very large proportion and would end aviation.

Is there any context in which a change of $R^2$ from 0.564 to 0.565 is important? I can't think of one, offhand, but I haven't had all my coffee :-). Perhaps some variation on the plane crash scenario.