Solved – Hierarchical classification where leaf nodes in a tree are at no particular level

cartclassificationmachine learningmulti-classpython

I have a set of hierarchical classes (ex. "object/architecture/building/residential building/house/farmhouse"), and I build a tree where each node is a classifier. However, the appropriate class for a particular featureset could be on any level (e.g. "object/architecture/building").

Currently, I use two different methods–>

  1. I make locally optimal decisions unless the predicted class is a leaf node, where I divide its probability by a constant (hyperparameter), and then recheck to get the class with the highest probability until I hit a leaf node.
  2. At every node, I trace through the 3 children with the highest probabilities, and then propagate the probabilities down to the leaves (by just summing the logs). Then I add a hyperparameter based on the level down the tree (the deeper the level, the higher the score).

Both of these methods were implemented to account for the system most often deciding that a class should be too high up in a tree (that is, in a decision between say object/architecture/building/house and object/architecture/building, without the constants, the system will almost always prefer object/architecture/building).

This is not good–I realize I should not have this constant in the mix, but I'm not sure how to accomplish the task without it. Also, if I could somehow not have to use probabilities (and could use just distance to hyperplanes instead (so I could use svms instead of logistic regression as the classifiers in each node)), that'd be ideal. (I'm using python and scikit learn, just to be specific)

Any thoughts?

Best Answer

Here is one standard solution, create a new tree from the hierarchy and add a leaf-node "other" under every non-leaf node. This "other" node contains all the positive example which do not fall in any of the child-nodes.

For example, if you are hierarchy is

{animals : {sea:{fish,shark}} , {land:{lion,elephant}} }

you're new hierarchy will look like

{1 : {2:{fish,shark,sea(other)}} , {3:{lion,elephant,land(other)} , animals(other) }

Hope that makes sense.

Related Question