Solved – Use of bootstrap in clustering algorithms

bootstrapclusteringk-means

Are there clustering algorithms that take advantage of bootstrap?

For example can one combine bootstrap with a standard K-Means algorithm to scale K-Means.

I was thinking if the following at a high-level:

  1. Use bootstrap to create sub-samples from the elements that need to be clustered.
  2. Each of these sub-samples should be the size that K-Means scale
  3. K-Means each sub-sample in parallel
  4. Use the co-occurrence of elements in clusters to extract clusters from the larger population

Thanks.

Best Answer

If you're still interested in applying bootstrap approach to clustering, I would recommend you to read the following two excellent and IMHO relevant answers: one on applying bootstrapping to clustering (https://stats.stackexchange.com/a/11702/31372) and another on determining optimal number of clusters and more (https://stackoverflow.com/a/15376462/2872891). In addition to some fit measures, mentioned in the above-referenced answers, I'd also suggest to use AIC and/or BIC, as described in answers to this question, with more details here.

There is a significant amount of research literature on using bootstrapping in clustering (for example, see this and this). Despite being beyond my current knowledge level, interest and the scope of this answer, I would like to point to two specifically interesting resources. This paper on using bootstrapping for speeding up K-means clustering might be also of interest, but requires "manual" implementation. This blog post seems to me interesting and relevant to the topic as well.

As a side note, I would recommend you to take a look at an alternative (non-K-means) approach to clustering, called model-based clustering, as implemented in mclust R package. The approach, methods and software are described in the paper "mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation" by Chris Fraley, Adrian E. Raftery, T. Brendan Murphy and Luca Scrucca: http://www.stat.washington.edu/research/reports/2012/tr597.pdf.

Related Question