Questions:
-
I have a large correlation matrix. Instead of clustering individual correlations, I want to cluster variables based on their correlations to each other, ie if variable A and variable B have similar correlations to variables C to Z, then A and B should be part of the same cluster. A good real life example of this is different asset classes – intra asset-class correlations are higher than inter-asset class correlations.
-
I am also considering clustering variables in terms stregth relationship between them, eg when the correlation between variables A and B is close to 0, they act more or less independently. If suddenly some underlying conditions change and a strong correlation arises (positive or negative), we can think of these two variables as belonging to the same cluster. So instead of looking for positive correlation, one would look for relationship versus no relationship. I guess an analogy could be a cluster of positively and negatively charged particles. If the charge falls to 0, the particle drifts away from the cluster. However, both positive and negative charges attract particles to revelant clusters.
I apologise if some of this isn't very clear. Please let me know, I will clarify specific details.
Best Answer
Here's a simple example in R using the
bfi
dataset: bfi is a dataset of 25 personality test items organised around 5 factors.A hiearchical cluster analysis using the euclidan distance between variables based on the absolute correlation between variables can be obtained like so:
The dendrogram shows how items generally cluster with other items according to theorised groupings (e.g., N (Neuroticism) items group together). It also shows how some items within clusters are more similar (e.g., C5 and C1 might be more similar than C5 with C3). It also suggests that the N cluster is less similar to other clusters.
Alternatively you could do a standard factor analysis like so: