Supervised Clustering vs Classification – Key Differences Explained

classificationclusteringstatistical-learningunsupervised learning

The second question is that I found in a discussion somewhere on the web talking about "supervised clustering", as far as I know, clustering is unsupervised, so what is exactly the meaning behind "supervised clustering" ? What is the difference with respect to "classification" ?

There are many links talking about that:

http://www.cs.uh.edu/docs/cosc/technical-reports/2005/05_10.pdf

http://books.nips.cc/papers/files/nips23/NIPS2010_0427.pdf

http://engr.case.edu/ray_soumya/mlrg/supervised_clustering_finley_joachims_icml05.pdf

http://www.public.asu.edu/~kvanlehn/Stringent/PDF/05CICL_UP_DB_PWJ_KVL.pdf

http://www.machinelearning.org/proceedings/icml2007/papers/366.pdf

http://www.cs.cornell.edu/~tomf/publications/supervised_kmeans-08.pdf

http://jmlr.csail.mit.edu/papers/volume6/daume05a/daume05a.pdf

etc …

Best Answer

My naive understanding is that classification is performed where you have a specified set of classes and you want to classify a new thing/dataset into one of those specified classes.

Alternatively, clustering has nothing to start with and you use all the data (including the new one) to separate into clusters.

Both use distance metrics to decide how to cluster/classify. The difference is that classification is based off a previously defined set of classes whereas clustering decides the clusters based on the entire data.

Again my naive understand is that supervised clustering still clusters based on the entire data and thus would be clustering rather than classification.

In reality i'm sure the theory behind both clustering and classification are inter-twinned.

Related Question