Relationship between different types of correlation coefficients

correlationlinear regressionregressionregression analysis

Let,

$r_{1(2.34…p)}$ = Correlation between $x_1$ and $x_{2.34…p}$. The latter being the residuals after regressing $x_2$ on $x_3 , x_4 ….x_p$.

$r_{1.234..p}$ = Multiple correlation coefficient of regressing $x_1$ on $x_2 , x_3, x_4….x_p$

Prove that –

${r_{1.23…p}}^2 = {r_{1p}}^2 + {r_{1(p-1.p)}}^2 + …… + {r_{1(2.34…p)}}^2$

I tried writing the $correlation^2$ coefficients first in terms of $covariance^2$ by variance*variance. Variance of $x_1$ will cancel out from both the sides. Then I tried substituting all the residuals/fitted values in the covariances with linear combinations of ${x_i}'s$, but to no avail. How to prove this equality?

Best Answer

${r_{1.23...p}}^2 = {r_{1p}}^2 + {r_{1(p-1.p)}}^2 + ...... + {r_{1(2.34...p)}}^2$

$r_{12.34...p}$ = Partial Correlation between 1 and 2 removing the effects of 3,4,...p.

$x_{1.34...p}$ = Residuals of 1 after regressing on 3,4,...p

$s_{11.34....p}$ = Variance of residuals of 2 after regressing on 3,4...p

$r_{12.34...p}^2$ = $\left({Cov(x_{1.34...p},x_{2.34...p})}\over {\sqrt(s_{11.34....p}) \sqrt(s_{22.34...p})}\right)^2$

$r_{12.34...p}^2$ = $\left({Cov(x_{1},x_{2.34...p})}\over {\sqrt(s_{11.34....p}) \sqrt(s_{22.34...p})}\right)^2$

Because, normal equations, residuals of $x_1$ with $x_3,x_4...x_p$ will give zero when multiplied with $x_3,x_4....x_p$ Multiplying dividing with $(\sqrt(s_{11}))^2$

$r_{12.34...p}^2$ = $(r_{1(2.34...p)})^2 \times s_{11} \over s_{11.34....p} $


Now,

$x_{1.23....p}$ = Values of $x_1$ regressed on $x_2, x_3....x_p$

We look at $\sum_{i} ((x_{1.23....p})_i)^2 = \sum_{i} ((x_1)_i)\times((x_{1.23....p})_i)$

= $\sum_{i} ((x_{1.34...p})_i)\times ((x_{1.23....p})_i)$

Writing $((x_{1.23....p})_i) = ((x_1)_i) - \sum_{j=2}^{p}b_j \times ((x_j)_i)$

= $\sum_{i} ((x_{1.34...p})_i)\times((x_1)_i) - \sum_{j=2}^{p}((x_{1.34...p})_i)\times b_j \times ((x_j)_i)$

= $\sum_{i} ((x_{1.34...p})_i)\times((x_1)_i) - ((x_{1.34...p})_i)\times b_2 \times ((x_2)_i)$

We know, $b_2 = b_{12.34...p}$ ( Coefficient of $x_2$ when $x_1$ is regressed on $x_2,x_3...x_p$ is same as partial relation coefficient between residuals of $x_1$ and $x_2$ after removing the effects of $x_3,x_4...x_p$

= $\sum_{i} ((x_{1.34...p})_i)\times((x_1)_i) - ((x_{1.34...p})_i)\times b_{12.34...p} \times ((x_2)_i)$

= $\sum_{i} ((x_{1.34...p})_i)\times((x_1)_i) - ((x_{1.34...p})_i)\times b_{12.34...p} \times ((x_{2.34...p})_i)$

= $\sum_{i} (x_{1.34...p})_i) \times (((x_1)_i) - b_{12.34...p} \times (((x_{2.34...p})_i)$

= $\sum_{i} ((x_{1.34...p})_i))^2 - ((x_{1})_i)\times b_{12.34...p} \times ((x_{2.34...p})_i)$

So,

$s_{11.23...p} = s_{11.34...p} - b_{12,34,,,p} \times \sum_{i} ((x_1)_i) \times ((x_{2.34...p})_i)$


Using

$b_{12.34...p}$ = $r_{12.34...p} \sqrt{s_{11.34...p}} \over \sqrt{s_{22.34...p}}$

1 - ${r_{12.34...p}}^2$ = $s_{11.23...p} \over s_{11.34..p}$

and

1 - ${r_{1.23...p}}^2$ = $s_{11.23...p} \over s_{11}$

in the two equations derived above, cancelling and manipulating, we will get the desired result.