Solved – Multiple paired distances in a single clustering application

clusteringdistancematrix

I have the following problem. I have 70 people, with a bunch of variable-data on each of them. What I also have, are 3 dissimilarity matrices, wherein the dissimilarity between each person and every other person is already computed. The goal is to cluster these people. 2 are euclidean distance matrices, and the other one is a DTW distance between the time-series associated with each of them.

Is there a way to incorporate information from several distinct dissimilarity matrices for the same set of people into the clustering solution?

What I thought could be done:

  1. Cluster the people individually using the variables, and each of the matrices. We get 4 new variables which are simply the cluster assignments in each of these subsets. But this is blind to the intricacies of each of the matrices, for it just looks at the resultant clusters, which may not be very distinct using say one of the matrices.
  2. Compute a matrix for the lone variables, resulting in 4 matrices. Create a single matrix by summing/averaging the individual pairwise distances. But then how to combine the values is an issue, another being how valid this method is.

Any help will be greatly appreciated. I am using R for the computation. Do let me know if you need more information on the problem.

Best Answer

I am not sure it is exactly what you want, but one possibility is to transform each of the dissimilarity matrices into a coordinate system (using e.g. an MDS algorithm). The dimensions from each coordinate system can then be added, and the resulting space be used in a single clustering step.

For instance, if you have 3 matrices of size 70x70 (for 70 samples), then you can find a bidimensional space for each matrix, leading to a 70x2 matrix for each dissimilarity matrix. You can stack these matrices together, by assuming the dimensions are independent, into a 70x6 matrix. Therefore each sample is now represented by 6 coordinates. (A new distance matrix can be calculated between them, using these new coordinates.)