Solved – Difference between Ward hierarchical clustering and K-Means for classification

classificationhierarchical clusteringk-meansward

I have a dataset where of socio-demographic features of a population (expressed as percentages over the total population of the municipality: e.g. 12% of freelancers, 5% of unemployed etc.), each observation is a municipality of the city. My goal is to politically classify each municipality in left/right (roughly). I compare both K-Means and hierarchical clustering using the Ward method, and I find that the latter performs way better, misclassifying only 2% of the points, while kmeans does a little worse, with a 6% of wrong points.

My question is: from a theoretical pov, how do I interpret this result? Why should one perform better than another in such a situation?

Best Answer

It should be noted that K-means is an insufficient method that was developed for adding machines and is now outdated. I justified this in detail in my article "Reclassification formula that provides to surpass K-means method" (2012) in arXiv. Ward's clustering combined with advanced K-means (so called, K-meanless method) provides the accurate optimal clustering with the lowest possible approximation error for each cluster number. This is true, at least for digital images I work with. M. Kharinov

Related Question