Solved – Partitioning Around Medoids (PAM) with Gower distance matrix

binary dataclusteringk medoidsr

My data is is mostly continuous but has one binary variable. I tried the pam algorithm in R with the Gower index, but the number of clusters that give the best silhouette width is 2 – allowing the binary variable to completely dominate the result.

  • Is PAM the wrong approach?
  • Is it OK to choose a higher k just because it will give more meaningful results?

Best Answer

If the binary variable is not very useful, try putting less weight on it.

There is nothing wrong with having a domain expert manually assign weights to different attributes to help the algorithm find new information. That the binary attribute splits the data into two is a correct result, now you want to find something new, so either remove it (weight 0) or at least reduce the weight.