Solved – How is adjusted coefficient of determination ($R^2$) linked to the F-values of a test against zero when adding a new variable

regression

Someone claims that the adjusted $R^2$ will increase with the addition of an extra variable.
I wonder why, as it is called adjusted (in contrast to the normal $R^2$).

The only condition it has to satisfy (to increase the adjusted $R^2$) is that the F-value (by the way, how is it simple to calculate it?) of the null hypothesis that the new variable is greater than 1.

Can someone give me a hint where the links between the adjusted $R^2$ and the F-Stat of that test are?

And however, who wants to include a new variable in a multiple OLS regression model anyway, if the beta was tested to be 0? Therefore adjusted $R^2$ always changes.

Best Answer

The assertion of the question is true. We usually show the inverse situation, i.e. the case of dropping one variable. In a linear multiple regression model $y_i = X\beta +u_i,\; i=1,...,n$, with $k$ regressors (including the constant term), if the t-ratio $t$ of a variable is less than 1, then dropping this one variable will increase adjusted R_squared, $\bar R^2$. When dealing with dropping one variable, then the corresponding F-statistic (reflecting just one linear restriction) is equal to $t^2$ (see this post). So both should be smaller than unity for $\bar R^2$ to increase. This result can be proven as follows: $\bar R^2$ is defined as

$$ 1- \bar R^2 = \frac {n-1}{n-k} (1-R^2) \qquad [1]$$

Denoting $S_{yy} = \sum_{i=1}^{n}(y_i-\bar y)^2$ and since $R^2 = 1 - \frac{\sum_{i=1}^{n}\hat u_i^2}{S_{yy}}$we can write

$$ (1- \bar R^2) = \frac {n-1}{n-k} \left(1-1 + \frac{\sum_{i=1}^{n}\hat u_i^2}{S_{yy}}\right) = \frac {n-1}{S_{yy}} \frac{\sum_{i=1}^{n}\hat u_i^2}{n-k}$$

$$\Rightarrow (1- \bar R^2)\frac {S_{yy}}{n-1} = \hat \sigma^2 \qquad [2]$$

By dropping a regressor, $S_{yy}$ and $n$ remain unaffected. So as a matter of mathematical necessity, the term $(1- \bar R^2)$ in the LHS of $[2]$ moves in the same direction as its RHS - meaning that as the OLS estimated variance of the regresion decreases, so is $(1-\bar R^2)$, and hence, $\bar R^2$ increases as $\hat \sigma^2$ decreases.

Consider now dropping one regressor, and index the various quantities related to this restricted regression with $r$. Denote $RSS$ the residuals sum of squares

The F-statistic to test whether the restricted regression with $k-1$ regressors is "better" than the regression with $k$ regressors is

$$F(1,n-k)= \frac{RSS_r -RSS}{RSS/(n-k)} = \frac {(n-k+1)\hat \sigma_r^2 - (n-k)\hat \sigma^2}{\hat \sigma^2}$$

$$=(n-k+1)\frac {\hat \sigma_r^2}{\hat \sigma^2} - (n-k) \Rightarrow \frac {\hat \sigma_r^2}{\hat \sigma^2} = \frac {F+(n-k)}{1+(n-k)} \qquad [2]$$

From $[2]$ it is obvious that if

$$F<1 \Rightarrow \hat \sigma_r^2 <\hat \sigma^2 \Rightarrow \bar R_r^2 > \bar R^2$$

And $F(1,n-k)=t^2 <1 \Rightarrow t<1$

Beware that the above results hold only when considering dropping just one regressor. Assume that we run the initial regression with $k$ regressors and we observe that two of them have t-ratios smaller than unity. This does not imply necessarily that if we drop both simultaneously, we will end up with a higher $\bar R^2$.

Now think in reverse - start from the "restricted" model and add one variable.

Related Solutions

Solved – How to perform RMSE analysis in SPSS

Compute your random sample definition, e.g.,

compute part = rv.uniform(0,1) <= .5.

Run the regression. Include this subcommand

/SELECT part EQ 1

and this

/SAVE PRED RESID

You can do this by specifying a selection variable in the Regression dialog box and by using the Save subdialog.

Now select the other part of the data, e.g.,

compute holdout = 1 - part.

Run Descriptives on RES_1.

Solved – How to compare coefficients of a negative binomial regression for determining relative importance

First you'd have to figure out what change in one variable is "equal" to a what change in another. The usual standardization uses the standard deviation, but that may or may not be ideal. It may not be possible to figure this out - particularly if the IVs are related to each other, in which case a change in one would go with a change in another.

Once you've figured that out, you can get the predicted values from various combinations of the IVs, varying each by the amount you thought was "equal" in the first step.

Another thing to do is to graph the predicted results as the independent variables change in value.

Best Answer

Related Solutions

Solved – How to perform RMSE analysis in SPSS

Solved – How to compare coefficients of a negative binomial regression for determining relative importance

Related Question