Solved – Using KNN for prediction, how should I normalize the data

k nearest neighbour

Is it better to constrain the data to a range, say [0,1], or to force a mean of 0 and sd of 1? Why? Does the type of input data matter (I'll be using both continuous and categorical variables)?

Best Answer

I think that depends on the data. If you know your feature is bounded, you could scale it to $[0,1]$. If it's binary I guess $\{0,1\}$ is a good choice, perhaps $\{-1,1\}$. Now, if it's unbounded, the standardization to $\text Z$-scores $\overline x = 0$, $\sigma=1$ is a reasonable choice.