Most clustering algorithms assume that data points in each row are independent. I have some data with repeated measurements from individuals.
I can use a standard algorithm, and then check to see if samples from the same person end up in the same cluster (for example by manual inspection of a dendrogram, or by looking at within group homogeneity and stability measures using clValid
's "biological" validation).
Are there any clustering algorithms (preferably with an implementation in R) that take account of the repeated measurements while calculating clusters?
Bonus features:
My dataset is very wide (more variables than samples), so being able to deal with that situation would be very useful.
Also, there are different numbers of measurements for individuals, so it would also be nice for the algorithms to deal with that.
The variables in my dataset are continuous rather than categorical.
Related:
Time series 'clustering' in R
How to cluster longitudinal variables?
Best Answer
If I were doing this, and I had enough samples per individual then I would approach it as follows:
The only requirement here, however, is that you have sufficient samples and that a GMM is an appropriate basis for your per-person data.
EDIT:
Consider the slide shown here: http://www.slideshare.net/EngrStudent/using-model-parameters-for-dimensionality-reduction