Is there an advantage to using higher dimensions (2D, 3D, etc) or should you just build x-1 single dimension classifiers and aggregate their predictions in some way?
This depends on whether your features are informative or not. Do you suspect that some features will not be useful in your classification task? To gain a better idea of your data, you can also try to compute pairwise correlation or mutual information between the response variable and each of your features.
To combine all (or a subset) of your features, you can try computing the L1 (Manhattan), or L2 (Euclidean) distance between the query point and each 'training' point as a starting point.
Since building all of these classifiers from all potential combinations of the variables would be computationally expensive. How could I optimize this search to find the the best kNN classifiers from that set?
This is the problem of feature subset selection. There is a lot of academic work in this area (see Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182. for a good overview).
And, once I find a series of classifiers what's the best way to combine their output to a single prediction?
This will depend on whether or not the selected features are independent or not. In the case that features are independent, you can weight each feature by its mutual information (or some other measure of informativeness) with the response variable (whatever you are classifying on). If some features are dependent, then a single classification model will probably work best.
How do most implementations apply kNN to a more generalized learning?
By allowing the user to specify their own distance matrix between the set of points. kNN works well when an appropriate distance metric is used.
There are no discriminative or generative tasks, but discriminative and generative models, for both regression and classification. There is a very nice paper that discusses this difference: On Discriminative vs. Generative classifiers: A comprarison of logistic regression and naive Bayes.
Basically, discriminative models attempt to estimate the conditional probability $P(y|x)$, where $y$ is the output conditioned on the input $x$. Generative models on the other hand estimate $P(x, y) = P(y|x)P(x)$. This allows you to sample from the generative model (and hence the name), because you also model how samples $x$ are generated.
Best Answer
KNN is a discriminative algorithm since it models the conditional probability of a sample belonging to a given class. To see this just consider how one gets to the decision rule of kNNs.
A class label corresponds to a set of points which belong to some region in the feature space $R$. If you draw sample points from the actual probability distribution, $p(x)$, independently, then the probability of drawing a sample from that class is, $$ P = \int_{R} p(x) dx $$
What if you have $N$ points? The probability that $K$ points of those $N$ points fall in the region $R$ follows the binomial distribution, $$ Prob(K) = {{N} \choose {K}}P^{K}(1-P)^{N-K} $$
As $N \to \infty$ this distribution is sharply peaked, so that the probability can be approximated by its mean value $\frac{K}{N}$. An additional approximation is that the probability distribution over $R$ remains approximately constant, so that one can approximate the integral by, $$ P = \int_{R} p(x) dx \approx p(x)V $$ where $V$ is the total volume of the region. Under this approximations $p(x) \approx \frac{K}{NV}$.
Now, if we had several classes, we could repeat the same analysis for each one, which would give us, $$ p(x|C_{k}) = \frac{K_{k}}{N_{k}V} $$ where $K_{k}$ is the amount of points from class $k$ which falls within that region and $N_{k}$ is the total number of points belonging to class $C_k$. Notice $\sum_{k}N_{k}=N$.
Repeating the analysis with the binomial distribution, it is easy to see that we can estimate the prior $P(C_{k}) = \frac{N_{k}}{N}$.
Using Bayes rule, $$ P(C_{k}|x) = \frac{p(x|C_{k})p(C_{k})}{p(x)} = \frac{K_{k}}{K} $$ which is the rule for kNNs.