Solved – How is adjusted coefficient of determination ($R^2$) linked to the F-values of a test against zero when adding a new variable

regression

Someone claims that the adjusted $R^2$ will increase with the addition of an extra variable.
I wonder why, as it is called adjusted (in contrast to the normal $R^2$).

The only condition it has to satisfy (to increase the adjusted $R^2$) is that the F-value (by the way, how is it simple to calculate it?) of the null hypothesis that the new variable is greater than 1.

Can someone give me a hint where the links between the adjusted $R^2$ and the F-Stat of that test are?

And however, who wants to include a new variable in a multiple OLS regression model anyway, if the beta was tested to be 0? Therefore adjusted $R^2$ always changes.

Best Answer

The assertion of the question is true. We usually show the inverse situation, i.e. the case of dropping one variable. In a linear multiple regression model $y_i = X\beta +u_i,\; i=1,...,n$, with $k$ regressors (including the constant term), if the t-ratio $t$ of a variable is less than 1, then dropping this one variable will increase adjusted R_squared, $\bar R^2$. When dealing with dropping one variable, then the corresponding F-statistic (reflecting just one linear restriction) is equal to $t^2$ (see this post). So both should be smaller than unity for $\bar R^2$ to increase. This result can be proven as follows: $\bar R^2$ is defined as

$$ 1- \bar R^2 = \frac {n-1}{n-k} (1-R^2) \qquad [1]$$

Denoting $S_{yy} = \sum_{i=1}^{n}(y_i-\bar y)^2$ and since $R^2 = 1 - \frac{\sum_{i=1}^{n}\hat u_i^2}{S_{yy}}$we can write

$$ (1- \bar R^2) = \frac {n-1}{n-k} \left(1-1 + \frac{\sum_{i=1}^{n}\hat u_i^2}{S_{yy}}\right) = \frac {n-1}{S_{yy}} \frac{\sum_{i=1}^{n}\hat u_i^2}{n-k}$$

$$\Rightarrow (1- \bar R^2)\frac {S_{yy}}{n-1} = \hat \sigma^2 \qquad [2]$$

By dropping a regressor, $S_{yy}$ and $n$ remain unaffected. So as a matter of mathematical necessity, the term $(1- \bar R^2)$ in the LHS of $[2]$ moves in the same direction as its RHS - meaning that as the OLS estimated variance of the regresion decreases, so is $(1-\bar R^2)$, and hence, $\bar R^2$ increases as $\hat \sigma^2$ decreases.

Consider now dropping one regressor, and index the various quantities related to this restricted regression with $r$. Denote $RSS$ the residuals sum of squares

The F-statistic to test whether the restricted regression with $k-1$ regressors is "better" than the regression with $k$ regressors is

$$F(1,n-k)= \frac{RSS_r -RSS}{RSS/(n-k)} = \frac {(n-k+1)\hat \sigma_r^2 - (n-k)\hat \sigma^2}{\hat \sigma^2}$$

$$=(n-k+1)\frac {\hat \sigma_r^2}{\hat \sigma^2} - (n-k) \Rightarrow \frac {\hat \sigma_r^2}{\hat \sigma^2} = \frac {F+(n-k)}{1+(n-k)} \qquad [2]$$

From $[2]$ it is obvious that if

$$F<1 \Rightarrow \hat \sigma_r^2 <\hat \sigma^2 \Rightarrow \bar R_r^2 > \bar R^2$$

And $F(1,n-k)=t^2 <1 \Rightarrow t<1$

Beware that the above results hold only when considering dropping just one regressor. Assume that we run the initial regression with $k$ regressors and we observe that two of them have t-ratios smaller than unity. This does not imply necessarily that if we drop both simultaneously, we will end up with a higher $\bar R^2$.

Now think in reverse - start from the "restricted" model and add one variable.

Related Question