Solved – How to determine the best split threshold (split point) in Hoeffding tree (for continuous attributes)

cart

In Hoeffding tree (that handles continuous attributes), Hoeffding inequity suggests the minimum amount of instance to be looked at before the best split attribute can be determined with confidence. What I don't get is when the best split attribute is decided, how is it guaranteed that the split threhsold (or split point) for the chosen attribute, based on the limited amount of instance, is also optimal?

Best Answer

The base Hoeffding tree as it was introduced does not handle numeric variables at all. There are a number of extensions to Hoeffdin trees that add numeric feature support but do so in different ways and have many different variations, so there is no one "true" way.

Some:

  • Store the numeric values in a dynamic tree structure, updating the tree at each insert
  • Track an online approximation of a fixed number of quantiles and use those quantiles to dynamically split on (either as a categorical variable or selecting one split from the quantiles)
  • Just compute the split like a normal decision tree but do so only every X inserts.
Related Question