Solved – Cluster analysis, what to do with different scales

clusteringspss

I want to identify different groups of respondents out of up to five variables of the European Values Study 2008.

At first I took 4 questions for cluster analysis all on a scale from 1 to 10. However, after performing a TwoStep cluster analysis the silhouette measure was close to zero and thus indicating a poor solution.

Therefor I'd like to take some other variables. I was thinking about:

  • how often discuss politics with friends [scale from 1(frequently) to 3(never)]
  • describe your state of health these days [scale from 1(very good) to 5(very poor)]
  • most people try to take advantage of you or try to be fair [scale from 1(most people would try to take advantage of me) to 5(most people would try to be fair)]
  • Why are there people in this country who live in need? Which one reason do you consider to be most important? (four options: because they are unlucky; because of laziness and lack of willpower; because of injustice in our society; it’s an inevitable part of modern progress)

My knowledge of cluster analysis is based on Marija J. Norušis PASW Statistics 18 Statistical Procedures Companion and whatever I can find on the internet. Unfortunately, I'm not able to find answers to these questions:

  1. Should I recode the answers so that they are all on the same scale?
  2. Is there anything I should do with the 4-option question?

Thank you

Best Answer

Two-step clustering in SPSS extends beyond just numeric variables. Usually rescaling is something necessary for k-means, but it can be used here to make interpretation (and graphing the solution) easier. Unfortunately there is no way to test the 1-cluster solution in the two-step to determine if clustering is even necessary. Given the poor results based on the Silhouette measure, there probably isn't much heterogeneity to detect among the variables you used, regardless of whether you decide to scale the variables or not.

If you are interested in using the clusters for predictive purposes, another option to pursue here could be a regression tree using the clustering variables as predictors.