Solved – Expected value of $R^2$, the coefficient of determination, under the null hypothesis

expected valuegoodness of fitr-squaredregression

I am curious about the statement made at the bottom of the first page in this text regarding the $R^2_\mathrm{adjusted}$ adjustment

$$R^2_\mathrm{adjusted} =1-(1-R^2)\left({\frac{n-1}{n-m-1}}\right).$$

The text states:

The logic of the adjustment is the following: in ordinary multiple regression, a random predictor explains on average a proportion $1/(n – 1)$ of the response’s variation, so that $m$ random predictors explain together, on average, $m/(n – 1)$ of the response’s variation; in other words, the expected value of $R^2$ is $\mathbb{E}(R^2) = m/(n – 1)$. Applying the [$R^2_\mathrm{adjusted}$] formula to that value, where all predictors are random, gives $R^2_\mathrm{adjusted} = 0$."

This seems to be a very simple and interpretable motivation for $R^2_\mathrm{adjusted}$. However, I have not been able to work out that $\mathbb{E}(R^2)=1/(n – 1)$ for single random (i.e. uncorrelated) predictor.

Could someone point me in the right direction here?

Best Answer

This is accurate mathematical statistics. See this post for the derivation of the distribution of $R^2$ under the hypothesis that all regressors (bar the constant term) are uncorrelated with the dependent variable ("random predictors").

This distribution is a Beta, with $m$ being the number of predictors without counting the constant term, and $n$ the sample size,

$$R^2 \sim Beta\left (\frac {m}{2}, \frac {n-m-1}{2}\right)$$

and so

$$E(R^2) = \frac {m/2}{(m/2)+[(n-m-1)/2]} = \frac{m}{n-1}$$

This appears to be a clever way to "justify" the logic behind the adjusted $R^2$: if indeed all regressors are uncorrelated, then the adjusted $R^2$ is "on average" zero.

Related Question