Solved – Is it possible to compare two different datasets with a different range of values

classificationk nearest neighbourmachine learningnormalizationstandardization

I am currently working with the k-Nearest Neighbors (KNN) algorithm.

Is it possible to compare two datasets with different ranges of values?

In other words, I have one dataset with a range $\in [0,1]$, and the second range of the other dataset $\in [-1,1]$.

I have calculated some performance measures such as accuracy and F1 score, among others. Then, I compared them based on the performance measures.

Is this approach correct? Or do I need to transform the datasets to they lie in common ranges?

EDIT

What I mean by the range that for example, all the observations in the first dataset are between 0 and 1. Also, When I say comparing two datasets, I mean comparing the outcome of the performance measures such as accuracy and F1 score and so on.

The range is for the predictor variables. The comparison is only to replicate some results. Also, to make sure about different methods.

Best Answer

The short answer is no, your approach may not lead to the results you are looking for. You should first normalize your variables so they lie on comparable ranges.

An in-depth answer to your question can be found on this thread which has a nice visual describing why normalization is useful for you. It also has a nice discussion in the comments of the answer which you may find helpful.

Best Answer

Related Solutions

Solved – How to find a statistical significance difference of classification results

Solved – Does cross-validation apply to K-Nearest Neighbors given no estimated parameters

Related Question