Solved – Why is the R-squared lower in a regression that contains more variables that the sum of R-squared from two difference regression functions

r-squared

Comparing three regression functions, one has a has three coefficients.

The other two functions have two and one coefficient respectively. The coefficients are the same a the one function with three coefficients though.

The sum of the r-squared value of the two equations with one and two coefficients is greater than the r-squared value of the one equation.

Is this because the one equation with three coefficients has one statistically insignificant coefficient, while the other two functions have all statistically significant coefficients?

Best Answer

Adding a regressor weakly increases unadjusted $R^2$.

Let's say you have two models where the 2nd model has an additional regressor:

  • Model 1: $y_i = a + \epsilon_i$
  • Model 2: $ y_i = a + b x_i + \epsilon_i$

Observe that model 1 is the same as model 2 with the restriction $b=0$. Estimating by least squares we have:

Sum of squared residuals (SSR) for model 1

$$ \begin{array}{*2{>{\displaystyle}r}} \mathit{SSR}_1 =& \mbox{min (over $a,b$)} & \sum_i \epsilon_i^2 \\ &\mbox{subject to} & y_i = a + b x_i + \epsilon_i \\ && b = 0 \end{array} $$

Sum of squared residuals (SSR) for model 2

$$ \begin{array}{*2{>{\displaystyle}r}} \mathit{SSR}_2 =& \mbox{min (over $a,b$)} & \sum_i \epsilon_i^2 \\ &\mbox{subject to} & y_i = a + b x_i + \epsilon_i \end{array} $$ The additional restriction $b=0$ can't make the minimum lower! Hence $SSR_1 \geq SSR_2$. Since unadjusted $R^2 = 1 - \frac{\mathit{SSR}}{\mathit{SST}}$ and the total sum of squares $\mathit{SST} = \sum_i (y_i - \bar{y})^2$ is the same for both cases, we have $R^2_1 \leq R^2_2$.

You can massively generalize this argument. Fitting a more flexible functional form cannot increase the sum of squared residuals and hence cannot decrease unadjusted $R^2$.

In the context of linear regression, if you add a regressor then unadjusted $R^2$ goes up (except for edge cases, such as a collinear regressor, where it stays the same). This is part of the reason why it's common to use adjusted $R^2$ which gives some penalty for adding regressors.