Clustering – How to Cluster Data with Repeated Measurements in R

clusteringrrepeated measures

Most clustering algorithms assume that data points in each row are independent. I have some data with repeated measurements from individuals.

I can use a standard algorithm, and then check to see if samples from the same person end up in the same cluster (for example by manual inspection of a dendrogram, or by looking at within group homogeneity and stability measures using clValid's "biological" validation).

Are there any clustering algorithms (preferably with an implementation in R) that take account of the repeated measurements while calculating clusters?

Bonus features:
My dataset is very wide (more variables than samples), so being able to deal with that situation would be very useful.
Also, there are different numbers of measurements for individuals, so it would also be nice for the algorithms to deal with that.
The variables in my dataset are continuous rather than categorical.

Related:
Time series 'clustering' in R
How to cluster longitudinal variables?

Best Answer

If I were doing this, and I had enough samples per individual then I would approach it as follows:

  • I would fit a Gaussian Mixture Model (GMM) to each person (use good model selection criteria and prove it is a good basis). I have used this to compress measures from 1000 dimensions toward 20 without loss of information.
  • I would then create a space comprised of parameters of the GMM's. One axis would be the means, one the variance, and one the weights. You could sort the GMM components by weight and make a parameter space for each, so you might have an axis for first mean, one for second, etc.
  • I would then cluster in that parameter space. You could do it with GMM's if you like. This will find modes within people that are similar.

The only requirement here, however, is that you have sufficient samples and that a GMM is an appropriate basis for your per-person data.

EDIT:

Consider the slide shown here: http://www.slideshare.net/EngrStudent/using-model-parameters-for-dimensionality-reduction