Solved – Classifying clusters using discriminant analysis

clusteringdiscriminant analysisr

Suppose I've data for 100 individuals for 5 variables, say Var1, Var2,…Var5. I run the cluster analysis using these 5 variables on these 100 rows & got 3 clusters. Now, I want to differentiate these 3 clusters based on 5 variables. That is, which among these 5 variables has been loaded more for which cluster, in order to get a meaningful interpretation of the clusters. Here I don't want to do PCA or other factor analysis.

I've heard that I can do that using discriminant analysis. Can anybody suggest me the method to do it?

Best Answer

A good idea might be to run some ANOVAS and MANOVAS on the cluster for whatever variables you're using. The variables that generated the cluster should generally yield significant differences, but if the 5 new vars you're incorporating were not the vars you used to generate the cluster solution, it's interesting to run them.

ANOVA, or a simple compare means test, maybe a t-test, will give you an F statistic, which is a relatively good indicator of how different each group [cluster in this case] is in terms of the relevant variables.

if your new 5 vars are categorical it might be as easy as a chi square test, but you might give multiple correspondence a try. multiple correspondence yields a biplot such that the distances between categories is an indicator of how much they tend to happen together, so if you have cluster 1 very near to 3 categories you conclude that those three categories are characteristic of cluster 1.

Or, you know, just describe the univariate statistics of each of your clusters.