Solved – Pearson correlation coefficient on multiple parameters

correlation

First of all: I'm mathematically challenged.

Question 1: I use the Pearson correlation coefficient to correlate people. As a real life example: I correlate those people by comparing the number of points they've given to movies. This works very well. Now I would like to correlate not 1 variable but multiple, e.g. correlate how much they like the individual actors. How should I do that? Just take the average of all n correlations?

Question 2: if person $A$ and $B$ correlate for 0.9 than that is better then $A$ and $C$ who correlate for 0.8. But what if I had e.g. 25 movies while comparing $A$ and $B$ and 100 while comparing $A$ and $C$, should I then still consider $A$ and $B$? Or is there some extra weighting formula for this?

Best Answer

  1. The average correlation is not unreasonable. You're looking for an overall measure of similarity, so you might just put all of the measurements into one big basket and take the overall correlation; you might want to standardize the two sets of ratings first and then take the overall correlation. You might also consider looking at RMS difference rather than correlation. You may find some discussion of this in books on cluster analysis.

  2. If you have reasonably large sample sizes, just comparing the correlations would be fine. If some of the sample sizes are quite small, it'd best to do something Bayesian (shrinking the estimates towards the overall average), along the lines described here.

Related Question