Solved – How to determine the best batch-size value for Mini Batch K-means algorithm

clusteringk-means

I am working on a project where I apply k-means on severals datasets. These datasets may include up to several billion points. I would like to use mini batch k-means to save time. However, the mini batch k-means requires a value for the batch size argument (I am using sklearn). What is the best way to choose a good batch size?

Best Answer

It is true that minibatch would be better to avoid the outlier. If you believe there is no outlier, then Kmeans should be better.

Related Question