Partial Correlation – Understanding the Meaning and Applications

conditioningcorrelationpartial-correlation

Formally, the partial correlation between $X$ and $Y$ given a set of $n$ controlling variables $Z = \{Z_1, Z_2, …, Z_n\}$, written $ρ_{XY·Z}$, is the correlation between the residuals $RX$ and $RY$ resulting from the linear regression of $X$ with $Z$ and of $Y$ with $Z$, respectively.

It says earlier that

partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random
variables removed.

I was wondering how the partial correlation $ρ_{XY·Z}$ is related to
the correlation between $X$ and $Y$ conditional on $Z$?
There is a special case for $n=1$.

In fact, the first-order partial correlation (i.e. when $n=1$) is nothing else than a difference between a correlation and the product
of the removable correlations divided by the product of the
coefficients of alienation of the removable correlations. The
coefficient of alienation, and its relation with joint variance
through correlation are available in Guilford (1973, pp. 344–345).

I was wondering how to write the above down mathematically?

Best Answer

Note that correlation conditional on $Z$ is a variable that depends on $Z$, whereas partial correlation is a single number.

Furthermore, partial correlation is defined based on the residuals from linear regression. Thus, if the actual relationship is nonlinear, the partial correlation may obtain a different value than the conditional correlation, even if the correlation conditional on $Z$ is a constant independent of $Z$. On the other hand, it $X,Y,X$ are multivariate Gaussian, the partial correlation equals the conditional correlation.

For an example where constant conditional correlation $\neq$ partial correlation: $$Z\sim U(-1,1),~X=Z^2+e,~Y=Z^2-e,~e\sim N(0,1),e\perp Z.$$ No matter which value $Z$ takes, the conditional correlation will be -1. However, the linear regressions $X|Z$,$Y|Z$ will be constants 0, and thus the residuals will be the values $X$,$Y$ themselves. Thus, the partial correlation equals the correlation between $X$,$Y$; which does not equal -1, as clearly the variables are not perfectly correlated if $Z$ is not known.

Apparently, Baba and Sibuya (2005) show the equivalence of partial correlation and conditional correlation for some other distributions besides multivariate Gaussian, but I did not read this.

The answer to your question 2 seems to exist in the Wikipedia article, the second equation under Using recursive formula.

Related Solutions

Correlation Analysis – Interpretation of Partial Correlation Explained in Detail

For understanding this I always prefer the cholesky-decomposition of the correlation-matrix.
Assume the correlation-matrix R of the three variable $X.Y.Z$ as $$ \text{ R =} \left[ \begin{array} {rrr} 1.00& -0.29& -0.45\\ -0.29& 1.00& 0.93\\ -0.45& 0.93& 1.00 \end{array} \right] $$ Then the cholesky-decomposition L is $$ \text{ L =} \left[ \begin{array} {rrr} X\\ Y \\ Z \end{array} \right] = \left[ \begin{array} {rrr} 1.00& 0.00& 0.00\\ -0.29& 0.96& 0.00\\ -0.45& 0.83& 0.32 \end{array} \right] $$ The matrix L gives somehow the coordinates of the three variables in an euclidean space if the variables are seen as vectors from the origin, where the x-axis is identified with the variable/vector X and so on.

Then the correlations of X and Y is $\newcommand{\corr}{\rm corr} \corr(X,Y)=x_1 y_1 + x_2 y_2 + x_3 y_3 $ and we see immediately it it $\corr(X,Y)=-0.29 $ because of the zeros and the unit-factor. We see also immediately the correlation $\corr(X,Z)=-0.45$ again because of the zeros and the unit-cofactor. However, the correlation between Y and Z is $\corr(Y,Z) = -0.29 \cdot -0.45 + 0.96 \cdot 0.83$ The partial correlation (after X is removed) is that part for which no variance in the X-variable is present, so $\corr(Y,Z)._X = 0.96 \cdot 0.83 $. Now imagine, the value $0.83$ would be $-0.83$ instead. Then the partial correlation would be negative and the correlation between Y and Z were $ 0.29 \cdot 0.45 - 0.96 \cdot 0.83$

What we see is, that the partial correlations are partly independent from the overall correlations (though within some bounds)

Solved – Partial correlation and multiple regression controlling for categorical variables

It seems to me that the only unanswered part of your question is the part cited below:

Also, is there any robust version of partial correlation (like kendall's 𝜏 τ /Spearman's rank correlation to Pearson's correlation)?

The same way you can have partial Pearson correlation coefficient, you can have partial Spearman correlation coefficient and also Kendall. See some R code below with the package ppcor that helps you with partial correlation.

library(ppcor)

set.seed(2021)
N <- 1000
X <- rnorm(N)
Y <- rnorm(N)
Z <- rnorm(N)

pcor.test(X, Y, Z, method='pearson')

You will be given an estimate of $-0.01175714$. If you rank the variables, that would be equivalent to the Spearman correlation.

pcor.test(rank(X), rank(Y), rank(Z), method='pearson')

And this way you get a partial spearman correlation of $0.008965395$. But you don't have to do this, you can just changed to spearman in the parameter of the function.

pcor.test(X, Y, Z, method='spearman')

And here we go, $0.008965395$ again. If you want to do the partial Kendall correlation, just changed the method parameter again.

pcor.test(X, Y, Z, method='Kendall')

This time, we got a partial Kendall correlation of $0.006344739$.

If by robust you mean not depending on the distribution of the random variables, among other things, and most importantly, a measure of independence, I recommend you to read about Mutual Information.

Best Answer

Related Solutions

Correlation Analysis – Interpretation of Partial Correlation Explained in Detail

Solved – Partial correlation and multiple regression controlling for categorical variables

Related Question