Solved – How to calculate normalized euclidean distance on two vectors

distanceeuclideanMATLAB

Let's say I have the following two vectors:

x = [(10-1).*rand(7,1) + 1; randi(10,1,1)];
y = [(10-1).*rand(7,1) + 1; randi(10,1,1)];

The first seven elements are continuous values in the range [1,10]. The last element is an integer in the range [1,10].

Now I would like to compute the euclidean distance between x and y. I think the integer element is a problem because all other elements can get very close but the integer element has always spacings of ones. So there is a bias towards the integer element.

First, is there really a bias when I would use just the (non normalized) Euclidean distance?

Second, how can I calculate something like a normalized euclidean distance on it?

Best Answer

  1. That depends; what are you trying to estimate? Recall the definition of bias: the bias of an estimator $\phi$ for a quantity $θ$ is the mean of $\phi - θ$. An estimator is unbiased when its bias is 0. So, a quantity can be biased when considered as an estimator of one quantity and unbiased considered as an estimator of another quantity.
  2. There are many options, one of which is to convert each dimension to $z$-scores before computing distances.