Solved – analyze or model a conditional correlation

correlationdata visualizationinteractionmultiple regressionregression

In my research I'm looking at the correlation between self-harm and aggression (both continuous). Now, I also have some variables (e.g. depressive symptoms; also continuous) which I do believe strengthen the relationship between aggression and self-harm. For instance, I believe that self-harm and aggression are more strongly related in people who have more depressive symptoms. How do I test for this?

I though about depressive symptoms being a moderator, but as far as I'm concerned moderators are only appropriate if you look at causal relationships (which I don't, because I look at correlation). Partial correlation also do not seem appropriate cause I want to predict, not control.

As a solution I thought about calculating the correlation coefficients (of aggression and self-harm) for each patient. Then do a multiple regression analysis with depressive symptoms etc. as predictors and the correlation coefficient as outcome variable. But would this be a valid method?

Best Answer

Can I analyze or model a conditional correlation?

This can be done using multivariate regression, which is a form of regression analysis where we have more than one response variable. (Not to be confused with multiple regression, where we have a single response variable but multiple explanatory variables.) A multivariate regression model gives you a predictive equation that predicts all response variables when the explanatory variables in the model are held constant. In this particular case, you could construct a regression model with self-harm and aggression as your two response variables, and depressive_symptoms as the explanatory variable. In R you would use code something like this:

#Construct linear regression for self-harm
#The object DATA is a data frame containing the variables
MODEL <- lm(cbind(self-harm, aggression) ~ depressive_symptoms, data = DATA);

#Extract estimated coefficients and variance matrix of estimates
coef(MODEL);
vcov(MODEL);

Under this model, you will get a fitted model that estimates the coefficients for both of the response variables. Some subsequent mathematics will allow you to determine the estimated correlation between the two response variables when the explanatory variable is held fixed.

Related Solutions

Solved – “Correlation” terminology in time series analysis

In order to avoid the spurious correlation problem, you should regress two stationary time series against one another. This can (potentially) provide a causal story. It is non-stationary series that lead to spurious correlation. See the reasoning given by my answer to this question (As a footnote, you may not need stationary series if they are integrated series, but I'd point you to any of the applied time series books to learn more about that.)

Solved – Partial correlation and multiple regression controlling for categorical variables

It seems to me that the only unanswered part of your question is the part cited below:

Also, is there any robust version of partial correlation (like kendall's 𝜏 τ /Spearman's rank correlation to Pearson's correlation)?

The same way you can have partial Pearson correlation coefficient, you can have partial Spearman correlation coefficient and also Kendall. See some R code below with the package ppcor that helps you with partial correlation.

library(ppcor)

set.seed(2021)
N <- 1000
X <- rnorm(N)
Y <- rnorm(N)
Z <- rnorm(N)

pcor.test(X, Y, Z, method='pearson')

You will be given an estimate of $-0.01175714$. If you rank the variables, that would be equivalent to the Spearman correlation.

pcor.test(rank(X), rank(Y), rank(Z), method='pearson')

And this way you get a partial spearman correlation of $0.008965395$. But you don't have to do this, you can just changed to spearman in the parameter of the function.

pcor.test(X, Y, Z, method='spearman')

And here we go, $0.008965395$ again. If you want to do the partial Kendall correlation, just changed the method parameter again.

pcor.test(X, Y, Z, method='Kendall')

This time, we got a partial Kendall correlation of $0.006344739$.

If by robust you mean not depending on the distribution of the random variables, among other things, and most importantly, a measure of independence, I recommend you to read about Mutual Information.

Best Answer

Related Solutions

Solved – “Correlation” terminology in time series analysis

Solved – Partial correlation and multiple regression controlling for categorical variables

Related Question