Usually, the larger the $k$ (the number of folds of the cross validation), the more accurate is the estimation of the RMSE. See Choice of K in K-fold cross-validation, per example.
In your case, performing a leave-one-out cross-validation (LOOCV) is not much more expansive than a ten fold! Indeed, in a ten fold, you will be doing predictions using 90% of $n$ lines. The overall number of operations will be $0.9n^2$. If you use all the training set (except the element you are trying to predict), you will be doing $n^2$ operations.
So, as the performance penalty for running a LOOCV is low (and this is true because you are using a KNN), I would probably use this.
That's almost right, but there's one extra step needed, which is to fit the model and choose $k$ using different sets of data (fitting in the case of knn being just remembering all the data points). If you fit the model and select the hyperparameters on the same set, the error will underestimate the true generalization error, which is the error you'd see on new data drawn from the same distribution. This means the classifier may do worse on new data than you'd expect. Cross validation is a way to choose $k$ that tries to minimize the generalization error.
Here's how to select $k$ using $d$-fold cross validation. It's usually called $k$-fold, but we're using $k$ to talk about the number of neighbors.
- Split the data into $d$ disjoint, similarly-sized subsets
- Hold out the first set. This is called the validation set.
- Train your classifier on the remaining data. In the case of knn classification, just remember all the data.
- For each value of $k$:
- Classify each point in the validation set, using its $k$ nearest neighbors in the training set
- Record the error
- Repeat steps 1-4 for all $d$ choices of the validation set.
- For each choice of $k$, find the average error across validation sets. Choose the value of $k$ with the lowest error.
- Construct a final classifier using all of the original data and the chosen value of $k$. This is what you'd use to classify new points.
In the case where $d$ is equal to the number of data points (i.e. each validation set contains a single point), this is called leave-one-out cross validation.
If you want to estimate the generalization error of the final classifier, some extra work is needed. You can't just take the error on the validation sets, because $k$ was chosen to minimize this, so it's an underestimate of the true generalization error. Another way of thinking about this is that procedure for choosing $k$ has to be considered part of the learning algorithm. The simplest thing to do is test the final classifier on a separate set of data you held out at the beginning (called the test set). This is a reasonable option if you have lots of data and a slow algorithm. Another option that takes longer but uses the data more efficiently is to use cross validation again. Here, there will be an 'outer' cross validation loop, where the held-out data is called the test set. At each step, you further split the data into training and validation sets and run a nested, 'inner' cross validation loop to choose $k$, as above. The final generalization error is estimated by classifying the test sets and averaging their errors.
Best Answer
k nearest neighbor classifiers use a distance measure , usually Euclid distance, to decide classification. Suppose you have this data set.
if you change m to cm. you have following data set.
Here your distances changed.
If you do not want this behavior, you need to normalize your data.
here you scale your data but you scale ALL of them. Since both k-nn and decision trees use distance measures to classify your data, classifications does not change.