Multiple Regression vs Partial Correlation Coefficient – Understanding Their Relationships

multiple regressionpartial-correlationregression coefficients

I don't even know if this question makes sense, but what is the difference between multiple regression and partial correlation (apart from the obvious differences between correlation and regression, which is not what I am aiming at)?

I want to figure out the following:
I have two independent variables ($x_1$, $x_2$) and one dependent variable ($y$). Now individually the independent variables are not correlated with the dependent variable. But for a given $x_1$ $y$ decreases when $x_2$ decreases. So do I analyze that by means of multiple regression or partial correlation?

edit to hopefully improve my question:
I am trying to understand the difference between multiple regression and partial correlation. So, when $y$ decreases for a given $x_1$ when $x_2$ decreases, is that due to the combined effect of $x_1$ and $x_2$ on $y$ (multiple regression) or is it due to removing the effect of $x_1$ (partial correlation)?

Best Answer

Multiple linear regression coefficient and partial correlation are directly linked and have the same significance (p-value). Partial r is just another way of standardizing the coefficient, along with beta coefficient (standardized regression coefficient)$^1$. So, if the dependent variable is $y$ and the independents are $x_1$ and $x_2$ then

$$\text{Beta:} \quad \beta_{x_1} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2} }{1-r_{x_1x_2}^2}$$

$$\text{Partial r:} \quad r_{yx_1.x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2} }{\sqrt{ (1-r_{yx_2}^2)(1-r_{x_1x_2}^2) }}$$

You see that the numerators are the same which tell that both formulas measure the same unique effect of $x_1$. I will try to explain how the two formulas are structurally identical and how they are not.

Suppose that you have z-standardized (mean 0, variance 1) all three variables. The numerator then is equal to the covariance between two kinds of residuals: the (a) residuals left in predicting $y$ by $x_2$ [both variables standard] and the (b) residuals left in predicting $x_1$ by $x_2$ [both variables standard]. Moreover, the variance of the residuals (a) is $1-r_{yx_2}^2$; the variance of the residuals (b) is $1-r_{x_1x_2}^2$.

The formula for the partial correlation then appears clearly the formula of plain Pearson $r$, as computed in this instance between residuals (a) and residuals (b): Pearson $r$, we know, is covariance divided by the denominator that is the geometric mean of two different variances.

Standardized coefficient beta is structurally like Pearson $r$, only that the denominator is the geometric mean of a variance with own self. The variance of residuals (a) was not counted; it was replaced by second counting of the variance of residuals (b). Beta is thus the covariance of the two residuals relative the variance of one of them (specifically, the one pertaining to the predictor of interest, $x_1$). While partial correlation, as already noticed, is that same covariance relative their hybrid variance. Both types of coefficient are ways to standardize the effect of $x_1$ in the milieu of other predictors.

Some numerical consequences of the difference. If R-square of multiple regression of $y$ by $x_1$ and $x_2$ happens to be 1 then both partial correlations of the predictors with the dependent will be also 1 absolute value (but the betas will generally not be 1). Indeed, as said before, $r_{yx_1.x_2}$ is the correlation between the residuals of y <- x2 and the residuals of x1 <- x2. If what is not $x_2$ within $y$ is exactly what is not $x_2$ within $x_1$ then there is nothing within $y$ that is neither $x_1$ nor $x_2$: complete fit. Whatever is the amount of the unexplained (by $x_2$) portion left in $y$ (the $1-r_{yx_2}^2$), if it is captured relatively highly by the independent portion of $x_1$ (by the $1-r_{x_1x_2}^2$), the $r_{yx_1.x_2}$ will be high. $\beta_{x_1}$, on the other hand, will be high only provided that the being captured unexplained portion of $y$ is itself a substantial portion of $y$.


From the above formulas one obtains (and extending from 2-predictor regression to a regression with arbitrary number of predictors $x_1,x_2,x_3,...$) the conversion formula between beta and corresponding partial r:

$$r_{yx_1.X} = \beta_{x_1} \sqrt{ \frac {\text{var} (e_{x_1 \leftarrow X})} {\text{var} (e_{y \leftarrow X})}},$$

where $X$ stands for the collection of all predictors except the current ($x_1$); $e_{y \leftarrow X}$ are the residuals from regressing $y$ by $X$, and $e_{x_1 \leftarrow X}$ are the residuals from regressing $x_1$ by $X$, the variables in both these regressions enter them standardized.

Note: if we need to to compute partial correlations of $y$ with every predictor $x$ we usually won't use this formula requiring to do two additional regressions. Rather, the sweep operations (often used in stepwise and all subsets regression algorithms) will be done or anti-image correlation matrix will be computed.


$^1$ $\beta_{x_1} = b_{x_1} \frac {\sigma_{x_1}}{\sigma_y}$ is the relation between the raw $b$ and the standardized $\beta$ coefficients in regression with intercept.


Addendum. Geometry of regression $beta$ and partial $r$.

On the picture below, a linear regression with two correlated predictors, $X_1$ and $X_2$, is shown. The three variables, including the dependent $Y$, are drawn as vectors (arrows). This way of display is different from usual scatterplot (aka variable space display) and is called subject space display. (You may encounter similar drawings locally here, here, here, here, here, here, here and in some other threads.)

enter image description here

The pictures are drawn after all the three variables were centered, and so (1) every vector's length = st. deviation of the respective variable, and (2) angle (its cosine) between every two vectors = correlation between the respective variables.

$Y'$ is the regression prediction (orthogonal projection of $Y$ onto "plane X" spanned by the regressors); $e$ is the error term; $\cos \angle{Y Y'}={|Y'|}/|Y|$ is the multiple correlation coefficient.

The skew coordinates of $Y'$ on the predictors $X1$ and $X2$ relate their multiple regression coefficients. These lengths from the origin are the scaled $b$'s or $beta$'s. For example, the magnitude of the skew coordinate onto $X_1$ equals $\beta_1\sigma_Y= b_1\sigma_{X_1}$; so, if $Y$ is standardized ($|Y|=1$), the coordinate = $\beta_1$. See also.

But how to obtain an impression of the corresponding partial correlation $r_{yx_1.x_2}$? To partial out $X_2$ from the other two variables one has to project them on the plane which is orthogonal to $X_2$. Below, on the left, this plane perpendicular to $X_2$ has been drawn. It is shown at the bottom - and not on the level of the origin - simply in order not to jam the pic. Let's inspect what's going on in that space. Put your eye to the bottom (of the left pic) and glance up, $X_2$ vector starting right from your eye.

enter image description here

All the vectors are now the projections. $X_2$ is a point since the plane was produced as the one perpendicular to it. We look so that "Plane X" is horizontal line to us. Therefore of the four vectors only (the projection of) $Y$ departs the line.

From this perspective, $r_{yx_1.x_2}$ is $\cos \alpha$. It is the angle between the projection vectors of $Y$ and of $X_1$. On the plane orthogonal to $X_2$. So it is very simple to understand.

Note that $r_{yx_1.x_2}=r_{yy'.x_2}$, as both $Y'$ and $X_1$ belong to "plane X".

We can trace back the projections on the right picture back on the left one. Find that $Y$ on the right pic is $Y\perp$ of the left, which is the residuals of regressing $Y$ by $X_2$. Likewise, $X_1$ on the right pic is $X_1\perp$ of the left, which is the residuals of regressing $X_1$ by $X_2$. Correlation between these two residual vectors is $r_{yx_1.x_2}$, as we know.

Related Question