In SPSS, the user can check the relative variable importance in a clustering result and produce a graph like the following:
Then, we can identify variables dominating predictor importance or having the most impact at determining clusters. Does anyone know how importance is computed here? I would like to implement this metric in R, if it is not already there.
Best Answer
I don't know the true exact answer but can offer a likely one.
In latest versions of SPSS Statistics command
TWOSTEP CLUSTER
the visual cluster descriptions, comparison and variable importance assessment was incorporated right into the command. In earlier releases (such as ver. 17, for instance) the similar task/output was carried out by a separate commandAIM
. That command still exists in SPSS (see Command Syntax Reference) so you could use it.Here is the looks of the older version dialog box. Note the button Plots which I've pressed.
Dialog "Plots" corresponds to
AIM
. Here is the syntax of Iris dataset clustering with default settings plus the "Plots" specifications I did on the pic above:The syntax says the four Iris data numeric variables to be the base of clustering with automatic selection of the number of clusters, and saving variable TSC_7469 with cluster membership (cluster labels). Then the for variables and the cluster result variable are picked by
AIM
which produces plots, among them are plots showing variable importance after I ran the analysis.Here what were the plots of variable importance:
There is two plots, one for each cluster produced. Current (my SPSS ver. 22)
TWOSTEP CLUSTER
analysis, not usingAIM
anymore, produced this plot:Notice how much this picture is like the precedeing graphs, it seems to be as if the averaged picture of those two.
If that is indeed true then we may conclude that modern
TWOSTEP CLUSTER
command has computed the "variable importance" of the scale variables this way: it performed t-tests or ANOVAs and used the to plot the -log10(p-values), rescaled so that the greatest observed value is 1, as the "variable importance" index. This or something very similar approach.Note that you can change settings in
AIM
to different (see dialog box above, as well as Command Syntax Reference).