[Math] How to calculate the distance between two points when x and y have different ranges

calculus

I'm trying to calculate the distance between two points in which the point's coordinates have different ranges, i. e., $0 < x < 1000$ and $0 < y < 100.$

So far, I've been using the Euclidean distance formula to calculate this distance:

$d = \sqrt{(x_{1}-x_{2})^2 + (y_{1}-y_{2})^2}$.

Is this the correct way to calculate the distance between two points given these conditions?

Should I normalize x and y so both of them can only have values between zero and 1?

Thanks for your help!

Best Answer

If you are performing k-means clustering, I would suggest doing some kind of normalization. Consider a more extreme case, where $y$ ranges from $0$ to $1$ and $x$ ranges from $0$ to $1000000$. All that will end up mattering, if you use Euclidean distance, is the $x$-values. The chances of a $y$-value having a significant effect on one point's distance from another is miniscule. So, mathematically, your $y$-variable is considered irrelevant.

Do you want your $y$-variable to be irrelevant? Is your $x$-variable roughly 10 times as important as your $y$? If not, then you should normalize.