Solved – Why is the decision boundary for K-means clustering linear

clusteringk-means

Apparently, for K-means clustering, the decision boundary for whether a data point lies in cluster $A$ or cluster $A'$ is linear.

I don't quite understand this statement. Why is it linear? Every iteration of K-means clustering, I reassign data points to clusters to minimize square error. Then, I reassign the prototypes (centers of the clusters) to minimize error again.

How do these processes create a "linear decision boundary"?

Best Answer

There are linear and non-linear classification problems. In a linear problem, you can draw lines, planes or hyperplanes (depending on the number of dimensions in your problem) in order to classify all your data points correctly. In a non-linear problem, you can't do that. As you know, lines, planes or hyperplanes are called decision boundaries.

K-means clustering produces a Voronoi diagram which consists of linear decision boundaries. For example, this presentation depicts the clusters, the decision boundaries (slide 34) and describes briefly the Voronoi diagrams, so you can see the similarities. On the other hand, neural networks depending on the number of hidden layers are able to deal with problems with non-linear decision boundaries. Finally, support vector machines in principle are capable of dealing with linear problems since they depend on finding hyperplanes. However, using the kernel trick, support vector machines can transform a non-linear problem into a linear problem (in a higher dimensional space)