Is there an advantage to using higher dimensions (2D, 3D, etc) or should you just build x-1 single dimension classifiers and aggregate their predictions in some way?
This depends on whether your features are informative or not. Do you suspect that some features will not be useful in your classification task? To gain a better idea of your data, you can also try to compute pairwise correlation or mutual information between the response variable and each of your features.
To combine all (or a subset) of your features, you can try computing the L1 (Manhattan), or L2 (Euclidean) distance between the query point and each 'training' point as a starting point.
Since building all of these classifiers from all potential combinations of the variables would be computationally expensive. How could I optimize this search to find the the best kNN classifiers from that set?
This is the problem of feature subset selection. There is a lot of academic work in this area (see Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182. for a good overview).
And, once I find a series of classifiers what's the best way to combine their output to a single prediction?
This will depend on whether or not the selected features are independent or not. In the case that features are independent, you can weight each feature by its mutual information (or some other measure of informativeness) with the response variable (whatever you are classifying on). If some features are dependent, then a single classification model will probably work best.
How do most implementations apply kNN to a more generalized learning?
By allowing the user to specify their own distance matrix between the set of points. kNN works well when an appropriate distance metric is used.
I expect you are talking about nominal categorical variables there? Ordinal variables with 100 levels are very strange. I have never seen a likert scale with 100 nuances or anything else that would warrant a 100 level ordinal variable. If you have ordinal variables with so many levels, investigate if you can reasonably transform them into interval variables. That can be done when it is reasonable to assume the distances between any two adjacent levels are the same across the scale.
If I had only nominal categorical data, I would first look at tree based models, that's where they naturally shine. With so many options within so few categorical variables, I would expect random forests to do better than single pruned trees. You can test both though.
Best Answer
Using test set and a validation set are related to completely different tasks.
1. Testing your model
You should test your model to measure the performance of your model. For this task, you have to separate a set of data from the same data distribution and keep it separately. You cannot touch this dataset or do any parameter tuning with this dataset in the model training process.
2. Validating your model
This is a subtask of your training process. You can reduce overfitting problem or non-generalized problem of your model on the fly by validating it. Overfitting or High Variance is caused by a hypothesis function that fits the available data but does not generalize well to predict new dataHere you use a small fraction of data got from your training data set. Leave One Out Cross Validation is one method of doing that.
Final Note: Testing with test data set is done at the very end of the pipeline and validation is done during the training process. Never ever use test data for training purposes. Also, you have to use test data reserved from the same distribution you used for training purpose.
Training using LOOCV
You can divide your training dataset into K bins (simply K sets). Now leave the first set and use other K-1 sets to train your model. After training in that round use that first set to test your model. In the next iteration leave the second set and use other K-1 sets to train. Then validate your model with the second set. Repeat this method K times. As an example: divide your dataset into 10 bins. Inside a loop, in every iteration leave one bin and train on other bins, then validate with the left bin. Repeat this for 10 times.
Hope you get it