Pearson’s correlation and common variance

statistics

Does a coefficent of determination $r^2$ for the Pearson correlation coefficient $r$ exist if we are only interested in the relationship between 2 variables but not in regresssion?

In case $r^2$ is only defined in the context of regression then we first have to define a regressor, check for equal variance of residuals, and normally distributed residuals?

My question was motivated by material from a statistical course that discusses $r$ and $r^2$ without referencing regression. Translation of the source:

The coefficient of determination can be calculated from the correlation by squaring:
Coefficient of determination = $r^2$

For the example this results in:
$r^2 = .628^2 = .394$
If this value is multiplied by 100, the result is a percentage. This indicates what proportion of the variance in both variables is determined by common variance sources. For the present example, the share of the common variance is 39.4%.

It is unclear for me how the common variance shall be derived without a regression line.

Best Answer

Consider two random variables $Y$ and $X$. Assume their correlation is $\rho$. Of course $\rho^2$ is always defined. However, consider a regression

$$Y=a+bX+\epsilon,$$

where $\epsilon$ is white noise. Here you can also define the regression the other way setting $X$ as the dependent variable.

Assume you estimate the parameters $(\hat{a},\hat{b})$ using OLS. Then you obtain the projection

$$\hat{Y}=\hat{a}+\hat{b}X.$$

This gives the linear relationship that best (lowest mean square distance) describes the dependency of $Y$ on $X$. You can essentially always run this regression and it is well defined assuming the relevant moments exist. Note that this does not require normality of residuals or the Gauss-Markov assumptions, which only imply that the regression has some additional "nice" properties. Of course this regression might not generally be the optimal way to model the dependency between $X$ and $Y$ nor OLS might not be the optimal way to estimate it, but in this case neither is the correlation coefficient an optimal measure of this dependency.

The coefficient of determination is

$$R^2\equiv\frac{Var(\hat{Y})}{Var(Y)}.$$

Now it turns out that $R^2=\rho^2$.

$R^2$ is defined in the context of a regression/projection. This tells you how much of the variance of $Y$ the projection explains. Applying the concept in some other context would be either misusing or extending the original meaning.

Related Question