Solved – k nearest neighbor with decision tree

cartcross-validationdata miningk nearest neighbour

A dataset has a few attributes. One of the attributes(attribute X) represents a distance with values expressed in meters. I use cross-validation to estimate the performance of Decision tree and k nearest neighbor classifiers.

My questions:

  1. Why will the performance of k-nn change if I change the
    representation of attribute X to cm instead of m?
  2. Why will the
    performance of k-nn and decision tree not change if I multiply all
    the attributes with 20?

Best Answer

1) Why will the performance of k-nn change if I change the representation of attribute X to cm instead of m ?

k nearest neighbor classifiers use a distance measure , usually Euclid distance, to decide classification. Suppose you have this data set.

10m 10kg
11m 11kg

 >> sqrt((11-10)^2 + (11-10)^2) 
 ans =
 1.4142

if you change m to cm. you have following data set.

1000cm 10kg
1100cm 11kg

Here your distances changed.

 >> sqrt((1100-1000)^2 + (11-10)^2)
 ans =
 100.0050

If you do not want this behavior, you need to normalize your data.

2) Why will the performance of k-nn and decision tree not change if I multiply all the attributes with 20?

here you scale your data but you scale ALL of them. Since both k-nn and decision trees use distance measures to classify your data, classifications does not change.

Related Question