Solved – X-means algorithm and BIC

bicclusteringk-meansmachine learningMATLAB

I want to simulate X-means algorithm based on [1] in MATLAB. I have some questions about this algorithm.

X-means Algorithm Steps:

(1) Initialize K = Kmin.

(2) Run K-means algorithm.

(3) FOR k = 1,. . . ,K:
Replace each centroid μk by two centroids μ(1) and μ(2).

(The two new centroids for the initialization of each of the K-means algorithms are obtained by perturbing an original centroid in two opposite directions along a randomly chosen vector by an amount proportional to the size of the respective cluster.)

(4) Run K-means algorithm with K = 2 over the cluster k.

Replace or retain each centroid based on the model selection criterion.

(the algorithm performs a model selection test BIC to determine whether the two new clusters are a better model than the original single cluster in each of the cases.
BIC_{k}=-2*logL+klogN which N is number of observations and k is number of clusters and logL is the log-likelihod)

(5) IF convergence condition is not satisfied, go to Step (2).
Otherwise Stop.

Question 1: In step (3) how can I determine new centroids?

Question 2: In step (4) how can I use the model selection criterion to replace or retain new centroids based on [2]?

[1] X-means: Extending k-means with efficient estimation of the
number of clusters

[2] Notes on BIC for X-means Clustering

There are two other questions about BIC calculation in X-Means, but I don't know how can I use BIC based on [2] for X-means algorithm.

X-mean algorithm BIC calculation question

Using BIC to estimate the number of k in KMEANS

Best Answer

The new centroids are creating a new structure hypothesis, basically that the existing cluster should be split into two. Define the new centroids along a random vector about 1/3 the distance to the edge of the cluster. If you are using K-means, the average distance of points to the centroid would be a suitable criterion. The real problem is that K-means should not be used for anything. You should use EM, which is the same algorithm but with Gaussian distribution assumptions instead of uniform distribution assumptions. The counterpart of this for Gaussian distributions is easier to apply and more intuitively sensible besides. Try splitting the Gaussian along its principal axis, about 1.2 standard deviations out from the mean vector.