Solved – Why are standard classification methods better than using clustering for classification

classificationclustering

If we group examples with and without class labels using clustering techniques by treating the class as an ordinary nominal attribute, the resulting clusters can then be used for classifying test instances by assigning the most frequent class in each cluster to test instances that fall into that cluster. Why is this method is less accurate than methods that are specifically developed for classification?

Best Answer

Because by assigning class labels based on clustering you are basically assuming your clustering constitutes a perfect classification. Whatever you do after that is trying to model the clusters, not the classes.

In fact, since you specifically say "by assigning the most frequent class in each cluster" you already know this is a bad approach as your clusters are NOT all in a single class. Using this approach you contaminate the pure information about your classes with guesses based on the structure within the data (or, rather, the apparent structure captured by your clustering algorithm).

Related Question