Solved – Suggestions for clustering ordinal, non-normal data (unsupervised)

clusteringmultivariate analysisr

I have data from about 250 self-injurers and would like to cluster analyze their reported motivations for self injury (and then explore cluster membership vis-a-vis various psychological measures). I have scores on 9 motivation scales which are likert-type but non-normal ($\log_{10}$-transformations yield <1 skewness for maybe 5 of the scales but 3 remain greater than 2 –as many participants only endorse one or two motivation types and give 0's on the other scales).
I understand that some algorithms are more robust against normality violations and I have been reading about possible routines to use in R including DBSCAN, MClust, and EMMIX but am struggling to wrap my mind around a best approach. Density vs. similarity approaches etc. I understand that DBSCAN struggles as dimensions increase: are 9 variables a concern for this approach?. I can live with losing cases as 'noise' and even non-exclusive cluster membership (though am less excited about that outcome). Any thoughts/references are most appreciated!

Best Answer

DBSCAN in the default setup needs just a dissimilarity matrix.

At 9 dimensions, even Euclidean distance should still work reasonably, and there are many other dissimilarity functions to choose from.

So why don't you just give DBSCAN a try?

The first thing to do is to choose an appropriate similarity for your problem, then determine the threshold you consider "similar".