Solved – When is $R^2$ the same as Pearson’s $r$ squared

correlationpearson-rr-squaredregression

I am a bit confused about the relationship between Pearson's $r$ and the Coefficient of Determination $R^2$.

$$
r = \frac{\sum\limits_{i = 1}^n (x_i – \overline x)(y_i – \overline y)}{\sqrt{\sum\limits_{i = 1}^n (x_i – \overline x)^2(y_i – \overline y)^2}}
$$

$$
R^2 = 1 – \frac{\sum\limits_{i = 1}^n (x_i – y_i)^2}{\sum\limits_{i = 1}^n (x_i – \overline x)^2}
$$

Let $y$ be the prediction and $x$ be the actual values, and $\overline y$ and $\overline x$ their means.

Most explanations I have read says that $R^2$ can be derived by squaring the Pearson's $r$, and hence the name. However, using the formulas given above, then a squaring of $r$ does not equal $R^2$; at least for the data I have tried with.

Are the formulas wrong, or what is happening?

Wikipedia have a vague statement here: When an intercept is included, then $r^2$ is simply the square of the sample correlation coefficient (i.e., r).

Best Answer

Your confusion lies in the fact that this only works if $x_i = \hat{y_i}$, where $\hat{y_i}$ are the OLS predicted values for $y$ based on $x$ (i.e. linear regression of $y$ on $x$). Try running this regression and using $\hat{y}$ instead of $x$ and it'll work.

The short version is that for this to work, $x$ needs to satisfy several criteria (e.g. $\sum x_i(y_i - x_i) = 0$, etc), which are satisfied when $x_i = \hat{y_i}$.

Related Question