Solved – Multiple Correlation Coefficient with three or more independent variables

correlationmultiple regressionr-squaredregression

The formula for the multiple coefficient of correlation of two independent variables ($x_1$ and $x_2$) and an dependent variables ($y$) is this:

$$R=\sqrt{\frac{r^2_{yx_1}+r^2_{yx_2}-2r_{yx_1}r_{yx_2}r_{x_1x_2}}{1-r^2_{x_1x_2}}}$$

What is the formula for three ($x_1$, $x_2$, $x_3$) or four ($x_1$, $x_2$, $x_3$, $x_4$) independent variables? I would like to know for my regression analysis.

Best Answer

Suppose we have a linear regression model $$y=y_{12\ldots p-1}+\varepsilon_{12\ldots p-1}\,,$$

where $y_{12\ldots p-1}=\beta_0+\beta_1 x_1+\beta_2x_2+\cdots+\beta_{p-1}x_{p-1}$ is the part of $y$ explained by $(x_1,x_2,\ldots,x_{p-1})$ and $\varepsilon_{12\ldots p-1}$ is the unexplained part. Parameters $(\beta_0,\beta_1,\ldots,\beta_{p-1})$ are estimated by the method of least squares to obtain the fitted model $\hat y=\hat y_{12\ldots p-1}$.

By definition, the (sample) multiple correlation coefficient of $y$ on $x_1,x_2,\ldots,x_{p-1}$ is $$r=r_{0\cdot 12\ldots p-1}=\operatorname{corr}(y,\hat y)$$

A related quantity is the coefficient of determination, given by

$$r^2=\frac{\operatorname{var}(\hat y)}{\operatorname{var}(y)}=1-\frac{\operatorname{var}\left(\varepsilon_{12\ldots p-1}\right)}{\operatorname{var}(y)}$$

Towards getting a computational formula of $r$, consider the correlation matrix $R=(r_{ij})_{0\le i,j\le p-1}$ of $(y,x_1,\ldots,x_{p-1})$ where $r_{ij}=\begin{cases}\operatorname{corr}(y,x_j)&,\text{ if }i=0 \\\operatorname{corr}(x_i,x_j)&,\text{ else }\end{cases}$ for every $j$.

So the matrix looks like

$$R=\begin{bmatrix}1& r_{01}& r_{02}& \cdots& r_{0\overline{p-1}} \\ r_{01}& 1& r_{12}& \cdots & r_{1\overline{p-1}} \\ r_{02}& r_{12}& 1& \cdots& r_{2\overline{p-1}} \\ \vdots& \vdots & \vdots& \ddots& \vdots \\ r_{0\overline{p-1}}& r_{1\overline{p-1}}& r_{2\overline{p-1}}& \cdots& 1 \end{bmatrix}$$

Let $R_{ij}$ be the cofactor of the $(i,j)$th element of $R$.

Then it can be shown that

$$\color\green{\boxed{r=\sqrt{1-\frac{\det R}{R_{11}}}}}$$

(Nothing changes if there is no intercept in the model.)

The above gives an expression in terms of the simple correlation coefficients $r_{ij}$. The formula in the original post can be derived as a particular case when $p=3$:

$$r=\sqrt{\frac{r^2_{01}+r^2_{02}-2r_{01}r_{02}r_{12}}{1-r^2_{12}}}$$

If $(r_{ij})^{-1}=(r^{ij})$, then yet another formula is $$\boxed{r=\sqrt{1-\frac1{r^{00}}}}$$

In terms of the dispersion matrix $(s_{ij})_{0\le i,j\le p-1}$ of $(y,x_1,\ldots,x_{p-1})$ and $(s_{ij})^{-1}=(s^{ij})$, we have

$$\boxed{r=\sqrt{1-\frac1{s_{00}s^{00}}}}$$

For details and other formulae, following references are helpful:

Related Question