# Machine Learning – Difference Between K-Nearest Neighbor and Adaptive K-Nearest Neighbor Algorithms

artificial intelligenceclassificationmachine learning

While studying KNN, I have come to see "Adaptive K-Nearest Neighbor algorithm" and I understand KNN but I wonder how "Adaptive K-Nearest Neighbor" work. I tried to find some materials but have failed to find a reasonable one, on the Internet, that I can approximately understand. Hope to hear some explanations.

As you probably already know, in the classical $$k$$-Nearest Neighbor (kNN) algorithm, you choose a natural number $$k$$, and for any new example $$X$$, you assign to it the most represented class amongst its $$k$$ nearest neighbors $$X_{(1)},\ldots,X_{(k)}$$ in the dataset.
As briefly discussed in the Wiki article however, the choice of $$k$$ is a big deal and will drastically impact the model performance. There are different ways to choose $$k$$ in order to minimize the error made by the model, but it can never be perfect (small $$k$$ implies high bias while high $$k$$ implies high variance).

That is the issue the adaptive $$k$$-NN algorithm tries to fix : instead of fixing one $$\bf k$$ with which predictions are made for all unseen $$\bf X$$, the value of $$\bf k$$ used to make the prediction depends on "where" in the feature space $$\bf X$$ is located.
In other words, if the $$X$$ for which we're trying to predict the class is in a "low-noise" region, the algorithm will choose a smaller value of $$k$$, whereas if $$X$$ is in a "very noisy" region, the adaptive $$k$$-NN will use a higher value of $$k$$ to make its prediction.

So that's the main idea. As for how exactly the rule for the choice of $$k$$ is made, different rules have been proposed by different authors. I refer you to the short (4 pages), sweet, and very clear paper by S. Sun and R. Huang for a more comprehensive introduction : An Adaptive $$k$$-Nearest Neighbor Algorithm (2010).

Edit : By "noisy region", I mean a region where both classes (in binary classification) are very represented, i.e. close to the decision boundary, compared to a "noiseless" region where all the points belong to one class.
To take your example, let's say you have recorded the height and weight of a bunch of mice, and recorded whether they were obese or not, and next you want to predict if a new mouse is obese given its height and weight. So your dataset is made of pairs $$(X_i,y_i)$$, with $$X_i = (h_i,w_i) \in \mathbb R^2$$ the predictor and $$y_i \in \{0,1\}$$ represents whether the mouse is obese or not. In this scenario, intuitively, it is easy to see that in the region where the weight $$w$$ is larger than (say) 100g, the mice will almost certainly be obese, so there is no need to compare a new point $$X$$ in that region to many others examples to decide. On the contrary, since the size and weight are most likely very correlated, for simultaneously large $$w$$ and $$h$$, you will encounter many of both cases, so you might want to compare to a greater number of examples if you're in that region (the decision boundary).

As for how to adaptively choose $$k$$ and figure out that a region is noisy, that is done through data-driven procedures. Here is a sketch of the method the authors use in the paper I mentioned above (see Section III.A and Table 1 in the paper) :

1. For each $$X_i$$ in the training set, compute the smallest $$k=k(X_i) \in \{1,\ldots,9\}$$ such that a kNN algorithm with that value of $$k$$ correctly classifies $$X_i$$.
2. For a new datapoint $$X$$, find its nearest neighbor in the dataset $$X_{i_0}$$.
3. Apply the kNN procedure to $$X$$ with $$k = k(X_{i_0})$$ computed in step 1.

Note that in Step 1, the value $$9$$ is an arbitrary choice made by the authors, and it is possible that no kNN algorithm with value $$k\in\{1,\ldots,9\}$$ correctly classifies $$X_i$$, in which case they just set $$k(X_i) = 9$$.