Solved – How to identify the variables that contribute most to creating the partition between two clusters

boxplothierarchical clustering

This question is a follow on from a question I asked here :https://stackoverflow.com/questions/39127208/how-do-i-correctly-plot-the-clusters-produced-from-a-cluster-analysis-in-matlab/39142286?noredirect=1#comment65632215_39142286

I have carried out hierarchical clustering in Matlab to create five clusters from my data with 200 observations and 10 variables:
What I want to know is how to statistically identify the variables in a cluster analysis that are the most different between clusters: ie. the variables that contribute most to creating the partition between two clusters? What I have done so far is to create boxplots of each of the variables across all the clusters in one graph…which allows me to graphically ascertain if a particular variable varies alot across clusters. Is there a better way to do this?

Best Answer

Here is an idea: treat your 2 clusters as 2 classes. Create a binary model with 1 cluster as reference class. The most significant variables are the variables influencing the partition. You can look at statistics like p-value and chi-square to determine, which one is relatively more significant.