KNN – Is K-Nearest Neighbors a Discriminative Learning Algorithm?

classificationk nearest neighbourmachine learning

It seems that KNN is a discriminative learning algorithm but I can't seem to find any online sources confirming this.

Is KNN a discriminative learning algorithm?

Best Answer

KNN is a discriminative algorithm since it models the conditional probability of a sample belonging to a given class. To see this just consider how one gets to the decision rule of kNNs.

A class label corresponds to a set of points which belong to some region in the feature space $R$. If you draw sample points from the actual probability distribution, $p(x)$, independently, then the probability of drawing a sample from that class is, $$ P = \int_{R} p(x) dx $$

What if you have $N$ points? The probability that $K$ points of those $N$ points fall in the region $R$ follows the binomial distribution, $$ Prob(K) = {{N} \choose {K}}P^{K}(1-P)^{N-K} $$

As $N \to \infty$ this distribution is sharply peaked, so that the probability can be approximated by its mean value $\frac{K}{N}$. An additional approximation is that the probability distribution over $R$ remains approximately constant, so that one can approximate the integral by, $$ P = \int_{R} p(x) dx \approx p(x)V $$ where $V$ is the total volume of the region. Under this approximations $p(x) \approx \frac{K}{NV}$.

Now, if we had several classes, we could repeat the same analysis for each one, which would give us, $$ p(x|C_{k}) = \frac{K_{k}}{N_{k}V} $$ where $K_{k}$ is the amount of points from class $k$ which falls within that region and $N_{k}$ is the total number of points belonging to class $C_k$. Notice $\sum_{k}N_{k}=N$.

Repeating the analysis with the binomial distribution, it is easy to see that we can estimate the prior $P(C_{k}) = \frac{N_{k}}{N}$.

Using Bayes rule, $$ P(C_{k}|x) = \frac{p(x|C_{k})p(C_{k})}{p(x)} = \frac{K_{k}}{K} $$ which is the rule for kNNs.

Related Question