Is there an advantage to using higher dimensions (2D, 3D, etc) or should you just build x-1 single dimension classifiers and aggregate their predictions in some way?
This depends on whether your features are informative or not. Do you suspect that some features will not be useful in your classification task? To gain a better idea of your data, you can also try to compute pairwise correlation or mutual information between the response variable and each of your features.
To combine all (or a subset) of your features, you can try computing the L1 (Manhattan), or L2 (Euclidean) distance between the query point and each 'training' point as a starting point.
Since building all of these classifiers from all potential combinations of the variables would be computationally expensive. How could I optimize this search to find the the best kNN classifiers from that set?
This is the problem of feature subset selection. There is a lot of academic work in this area (see Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182. for a good overview).
And, once I find a series of classifiers what's the best way to combine their output to a single prediction?
This will depend on whether or not the selected features are independent or not. In the case that features are independent, you can weight each feature by its mutual information (or some other measure of informativeness) with the response variable (whatever you are classifying on). If some features are dependent, then a single classification model will probably work best.
How do most implementations apply kNN to a more generalized learning?
By allowing the user to specify their own distance matrix between the set of points. kNN works well when an appropriate distance metric is used.
From the documentation for KNeighborsClassifier
:
Warning: Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but but different labels, the results will depend on the ordering of the training data.
To get exactly what happens, we'll have to look at the source. You can see that, in the unweighted case, KNeighborsClassifier.predict
ends up calling scipy.stats.mode
, whose documentation says
Returns an array of the modal (most common) value in the passed array.
If there is more than one such value, only the first is returned.
So, in the case of ties, the answer will be the class that happens to appear first in the set of neighbors.
Digging a little deeper, the used neigh_ind
array is the result of calling the kneighbors
method, which (though the documentation doesn't say so) appears to return results in sorted order.
So ties should be broken by choosing the class with the point closest to the query point, but this behavior isn't documented and I'm not 100% sure it always happens.
Best Answer
KNN classifier in scikit-learn uses
_get_weights
method insklearn.neighbors.base
library. The inverse weighting is achieved when'distance'
is given asweights
paremeter. You can also call this function directly by giving your distances as input. The weight is $w=\frac{1}{d}$, but surprisingly, when $d$ is $0$, the weight is always set to $1$. In the code, it does annp.isinf
check and when a weight is infinite, it is set to the boolean value produced bynp.isinf
.