Solved – Index plot for each cluster sorted by the silhouette

clusteringdata visualizationrtraminer

After a cluster analysis I´m trying to plot for each cluster the Index plot of the Silhouette value instead of for the complete dataset
(like in the WeightedCluster Library Manual by Matthias Studer). First of all, is that theoretically correct? If yes…

I create the object "sil" with the wcSilhouetteObs command:

sil <- wcSilhouetteObs(distance.matrix, cluster.object)

Then I plot the Index Plot for the complete dataset (this line works, even if I can´t label the clusters, but for now I don´t care):

seqIplot(seq.data, group = group.p(cluster.object), sortv = sil)

But when I try to plot the Index plot sorted by the silhouette I don´t know (but as I said I´m not sure that it´s theoretically correct…) how to impose the restriction for the group argument selecting only the silhouette values e.g. for the first cluster.I have created a sequence object for each cluster separately (let´s say cluster1.seq), but what should I do then?

Thank you!
Emanuela

Best Answer

The silhouette is computed for each observation $i$ as

$s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$

where $a(i)$ is the average dissimilarity with members of the cluster to which $i$ belongs, and $b(i)$ the minimum average dissimilarity to members of another cluster.

The silhouette values of members of a cluster $k$ are at the same position as the values $k$ in the cluster membership vector cluster.object. So you do not have anything to do. Your seqIplot command will automatically produce one index plot for each cluster with the sequences sorted by their silhouette values in each cluster.

Sequences will be sorted bottom up from the lower to the highest silhouette value, meaning that the sequences with the best silhouette values for each cluster are at the top of the plots.

Hope this helps.

Related Question