Solved – Weighting variables in TwoStep cluster analysis

clusteringspss

I'm using SPSS to perform TwoStep cluster analyses. SPSS shows predictor importance of each variable used in an analysis. Oftentimes, a binary variable like gender (sorry, I'm just keeping it simple!) will be the most important variable to the formation of the clusters, even if you don't want it to be.

Is there a way to weight variables, so that maybe I can downplay, but not eliminate, gender's role in the analysis?

Thank you for the help!

Best Answer

One thing to keep in mind before turning to weights is that gender can be considered a "swamping" variable in a two-step cluster analysis. Differences between gender are oftentimes large, and thus overpower weaker, but still substantively interesting heterogeneity in your data.

Instead of down-weighting gender, you could consider looking into a finite mixture regression model. Finite mixture models are a model-based cluster analysis (clusters are usually assumed to be multivariate Gaussian) and a finite mixture regression model essentially combines a cluster analysis with a regression. In your case, you could use gender as a predictor, perform this analysis, and detect clusters while taking into account the predictive power of gender (as well as other variables of interest). More information can be found here and here in the flexmix R package documentation.

Related Question