Solved – VC-Dimension of k-nearest neighbor

k nearest neighbourmachine learningself-studyvc-dimension

What is the VC-Dimension of the k-nearest neighbor algorithm if k is equal to the number of training points used?


Context: This question was asked in a course I take and the answer given there was 0. I do, however, not understand why this is the case. My intuition is that the VC-Dimension should be 1, because it should be possible to choose two models (i.e. sets of training points) so that every point is labeled as belonging to one class according to the first model and as belonging to another class according to the second model, thus it should be possible to shatter a single point. Where is the mistake in my reasoning?

Best Answer

You say the algorithm is: k-nearest neighbor algorithm with k = number of training points used. I define this as jms-k-nearest-neighbor.

Since the VC dimension is the biggest number of training points which can be shattered by the algorithm with train error 0, the VC dimension of jms-k-nearest-neighbor can only be k or 0.

1 training instance => k=1: During training the jms-1-nearest-neighbor stores exactly this instance. During application on exactly the same training set, the one instance is the nearest to the stored training instance (because they are the same), so the training error is 0.

So I agree, the VC dimension is at least 1.

2 training instances => k=2: There only might be a problem if the labels are differing. In this case the question is, how the decision for a class label is made. Majority vote does not lead to a result (VC = 0?), if we use majority vote weighted inversely by distance, the VC dimension is 2 (assuming that it is not allowed to have the same training instance twice with differing labels, in that case the VC dimension of all algorithms would be 0 (I guess)).

There is no standard k-nearest neighbor algorithm, it is more of a family with the same basic idea but different flavors when it comes to implementation details.

Resources used: VC dimension slides by Andrew Moore