Solved – Good strategy for multiclass classification (when there is hierarchical class structure )

classificationmachine learningmulti-class

Combining binary classifiers, I want to solve multi-class classification problem in the following setting.
Suppose there is a dataset and each data is in one of four classes: A1, A2, B1 and B2. A1 and A2 are somehow similar. B1 and B2 are also somehow similar.
(e.g. A1=rose, A2=iris, B1=cow, B2=horse)

Intuitively, it seems a good idea to classify data in the following manner: first, split 'A1 and A2' from 'B1 and B2' then split A1 from A2 and B1 from B2. (so we make three classifiers.)
I call this approach as 'Hierarchical strategy.'

Of course, we can perform standard '1 vs rest strategy' or '1 vs 1 strategy' alternatively.

Here are my questions:

  1. Which is the best strategy from classification performance viewpoint. (I don't care about computational cost.)
  2. I know the performance depends on data. Under what kind of property the 'Hierarchical strategy' is good? I'm happy if someone can formalize the problem in mathematical terms.
  3. Are there research paper on this kind of considerations?

Best Answer

I can suggest a strategy from a classification performance viewpoint. Since the classification problem relies not only on the amount of classes and how 'similar' they are, but also on the features which describe this classification problem, I can't say if it is necessarily 'optimal'. Assuming your features have similar values for A1 and A2 (and also B1 and B2), I believe a good strategy for you is to grow a classification tree. See this Wikipedia article, the following paper and these videos for in depth details. The idea behind this classification is what you intuitively phrased as a 'Hierarchical strategy'.

The idea behind this strategy is to iteratively subdivide the feature space into regions. At each iteration, choose the 'best' feature and value that divides the data (according to some purity measure). As you stated "split 'A1 and A2' from 'B1 and B2' then split A1 from A2 and B1 from B2". Provided that your feature space can be separated, this classification strategy should work.

The final output will look like the image below. The nodes contain decisions (based on your training set) and the leaves represent the class outcome. The algorithm is interpretable, so you can easily see how well your classification strategy worked.

image from Wikipedia article