Solved – Rationale for Multi-Label vs. Single-Label learning

multilabel

I have not seen any research that compares the effects of single-label versus multi-label learning. What I mean by this is not comparing various types of evaluation metrics – such a comparison does not make sense, as single and multi-label learners use different evaluation metrics for test error.

But what I want is an example of how multi-label learning improves the application of classification, in any context. I know the rationale in general is "more, correct information will be better discriminatory information than less information", and I would expect to see better discrimination between classes separated by multi vs. single label learning methods. But where is the empirical evidence?

Can anyone help me out?

Clarification: I am referring to multi-class multi-label learning methods as opposed to normal single-label supervised learners. Examples of each are below:

Multi-class multi-label learners: 
multi-label kNN (ML-kNN)
multi-label backpropagation (ML-BP)
rank SVM
binary relevance
classifier chains
random k label sets

Multi-class Single-label learners:
1 vs. 1 SVM, 1 vs. all SVM

Best Answer

Note that there is a subtle but important difference between multilabel problems, in which each instance may belong to several classes, and multiclass problems, in which each instance belongs to one of $\geq 2$ classes. I will discuss both briefly, but based on the question I suspect you are referring to multiclass problems.

Multilabel problems can essentially be broken down into sets of binary problems without much loss of information. The only situations where a true multilabel formulation has advantages, at least in theory, is when some combinations of labels are simply not possible, which you cannot enforce cleanly in a set of binary learning problems.

Multiclass problems are often best not split up into a set of binary problems because some information may be lost. Ofcourse, this only applies if the learning technique effectively has a natural multiclass formulation (e.g., SVM does not, but neural networks and decision trees do). One of the big benefits of doing pure multiclass classification is that you typically have more data to learn from, which allows better discrimination between subtlely distinct classes. Additionally, there is often an obvious computational advantage in multiclass formulations which features no redundancy in contrast to sets of binary formulations (regardless of your binarization scheme, be it 1-v-1, 1-v-all, ...).

Related Question