Correlation – Calculating p-value for Weighted Pearson Correlation Coefficient

correlationp-valuepearson-rstatistical significance

I'm computing a weighted correlation coefficient, using the method described here.

I'd like to compute a p-value for the resulting r coefficient. How can I do this correctly, given that my r was computed using weights? Naturally, the standard formula for p-value of r (e.g., here) does not take weights into account, and I'm not sure how to properly account for weights when computing the p-value.

Best Answer

The $P$-value reported for a correlation depends on the sample correlation, the sample size, and a bundle of assumptions not always checked (independence being, in my experience, least checked of all). But there is a difference between a crude $t$-based $P$-value based on a null hypothesis of zero correlation and a more general $P$-value based on Fisher's $z$ transformation.

I don't think there is an answer to this independent of what the weights are. If weighting means that you are combining data from different subsamples, then the weights have implications for the sample size that should be used; at the same time correlations based on weighted combinations would not necessarily have the same distribution as the correlation distribution based on raw data.

At the same time, it is difficult to get agitated about this. If correlations have a point it is that they measure strength of relationship; if you are seriously in doubt that they are significantly different from zero, then it is arguable that you just have inadequately small samples and being precise about that problem is secondary.

It's likely that this misreads your problem, in which case you may have to give much more detail.

If getting really reliable $P$-values for weighted correlations is important to you, it is possible that you need to get a handle on it through simulation, including simulation of the weighting process if that is variable too.

Related Solutions

Solved – Pearson correlation coefficient on multiple parameters

The average correlation is not unreasonable. You're looking for an overall measure of similarity, so you might just put all of the measurements into one big basket and take the overall correlation; you might want to standardize the two sets of ratings first and then take the overall correlation. You might also consider looking at RMS difference rather than correlation. You may find some discussion of this in books on cluster analysis.
If you have reasonably large sample sizes, just comparing the correlations would be fine. If some of the sample sizes are quite small, it'd best to do something Bayesian (shrinking the estimates towards the overall average), along the lines described here.

Solved – Issues on computing Pearson correlation coefficient for two vectors

Hi this should not be a problem since the mean is explicitly subtracted. Here's a small example (all codes in r):

require(mnormt)
#We create a multivariate Normal random variable
df<-rmnorm(n = 100, mean = rep(0, 2), matrix(c(1,0.5,0.5,1),nrow=2)) 

#We compute the correlation
cor(df)
        [,1]      [,2]
 [1,] 1.0000000 0.5605498
 [2,] 0.5605498 1.0000000

#We scale the first variable by 1000
df[,1] <- df[,1]*10000

#The correlation stays the same
cor(df)
         [,1]      [,2]
 [1,] 1.0000000 0.5605498
 [2,] 0.5605498 1.0000000

Hope this helps.

Edit Follow up to the comments (thanks to whuber): I did understand the question as being related to the magnitude of the whole vector. I understand from the discussion that some understood the question as being related to outliers. In this case my solution is, of course, not helpful.

Best Answer

Related Solutions

Solved – Pearson correlation coefficient on multiple parameters

Solved – Issues on computing Pearson correlation coefficient for two vectors

Related Question