Solved – How to find best split in decision trees using label vectors

cartvariance

In single label learning, we have a single label that we need to predict. Most training tasks are single label, like the most common example where we need to predict if a customer is going to be good or bad.

In multi label learning there are more than one labels. For example if we want to tag a photo, we put more than one tags. The labels now are described by a vector and not by single values like in single label learning.

I am trying to build a decision tree that finds best splits based on variance. Me decision tree tries to maximize the following formula:

   Var(D)*|D| - Sum(Var(Di)*|Di|)

D is the original node and Di are the splits produced by choosing an attribute (by Di, i mean the node that is produced by choosing an attribute and its i-th value).

My problem is this. Imagine that i have a matrix for storing each node's label vectors, where in rows are the examples and in columns the values for every label. Lets say that this matrix is mXn. After finding the variance for this matrix, i have an 1Xn vector, where in each column, i have the variance for every label. Now i need to find the the difference between 2 vectors, where each vector is multiplied by a number.

My problem is how to decide which is the best split, when the outcome of my formula is a vector? I mean that if i had single labels, i would have a single number as an outcome and i would simply choose the biggest one. Now that i have vectors, how can i decide which is the best split? For example lets say that the outcome from my formula for 2 different attributes, are these 2 vectors:

 1,2,3,4,5    (attribute i)
 2,3,4,5,6    (attribute j)

How do i choose between attribute i and j?

Best Answer

It seems that the only solution is to normalize all label values to 0...1 and then add all the label values. The biggest sum is the one that we will use to find the attribute to split the node.