Solved – Clustering into ordered clusters

clusteringk-meansordinal-datar

In a research study I have a list of countries and data about them.

  • GDP
  • Population
  • Oil exports
  • Oil imports
  • Percentage of electricity produced with renewable energies
  • Urbanization
  • Percentage of GDP put into research in renewable energies

Now I would like to cluster these countries into three groups. In the end the group should be equivalent to:

  • Countries with high ecological standards
  • Countries with medium ecological standards
  • Countries with low ecological standards

These 3 categories are ordered. I would like to run the model additionally for 5 categories.

Which clustering algorithm would be most appropriate? Is k-means a good choice in this case? Which dangers arise when I use k-means? If you have some code in R solving a similar problem I would also be grateful.

Best Answer

This is, I think, not a problem for cluster analysis at all. Cluster analysis is unsupervised learning and you want some form of supervision.

What you seem to want is factor analysis, not cluster analysis, but maybe not FA either. If you already know what "ecological standards" means, you could derive a variable yourself. If not, then factor analysis of your existing variables might give you a factor that you think of as ecological standards.

That factor might break up into three groups, but it might not. I am not srue why you want this to break up into groups (and exactly three). I think it would be better treated, for almost all purposes, as a continuous variable.

But if you already know which country belongs in which group, then you have a classification task, which calls for other methods.