Is there an advantage to using higher dimensions (2D, 3D, etc) or should you just build x-1 single dimension classifiers and aggregate their predictions in some way?
This depends on whether your features are informative or not. Do you suspect that some features will not be useful in your classification task? To gain a better idea of your data, you can also try to compute pairwise correlation or mutual information between the response variable and each of your features.
To combine all (or a subset) of your features, you can try computing the L1 (Manhattan), or L2 (Euclidean) distance between the query point and each 'training' point as a starting point.
Since building all of these classifiers from all potential combinations of the variables would be computationally expensive. How could I optimize this search to find the the best kNN classifiers from that set?
This is the problem of feature subset selection. There is a lot of academic work in this area (see Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182. for a good overview).
And, once I find a series of classifiers what's the best way to combine their output to a single prediction?
This will depend on whether or not the selected features are independent or not. In the case that features are independent, you can weight each feature by its mutual information (or some other measure of informativeness) with the response variable (whatever you are classifying on). If some features are dependent, then a single classification model will probably work best.
How do most implementations apply kNN to a more generalized learning?
By allowing the user to specify their own distance matrix between the set of points. kNN works well when an appropriate distance metric is used.
Rescaling the input features is just a linear transformation. There's no right or wrong way of rescaling outside a problem context. If you want to map the range 1 - 100 to the range 1 - 10 linearly you should do:
$$
x \leftarrow \frac{x - 1}{99} \times 9 + 1
$$
This maps 1 to 1 and 100 to 10 and it will make the durations have the same range as the other features.
One problem with the method above is if all the durations are clustered between say 40, with just very few outliers close to 100 then most of the range won't be used. Calculating the z-score of each individual feature may be preferable:
$$
x \leftarrow \frac{x - \text{mean(x)}}{\text{stddev}(x)}
$$
as the transformed features will all have mean 0 and standard deviation 1 and should be more comparable.
Best Answer
For k-NN, I'd suggest normalizing the data between $0$ and $1$.
k-NN uses the Euclidean distance, as its means of comparing examples. To calculate the distance between two points $x_1 = (f_1^1, f_1^2, ..., f_1^M)$ and $x_2 = (f_2^1, f_2^2, ..., f_2^M)$, where $f_1^i$ is the value of the $i$-th feature of $x_1$:
$$ d(x_1, x_2) = \sqrt{(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2} $$
In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.
If they were not normalized and for instance feature $f^1$ had a range of values in $[0, 1$), while $f^2$ had a range of values in $[1, 10)$. When calculating the distance, the second term would be $10$ times important than the first, leading k-NN to rely more on the second feature than the first. Normalization ensures that all features are mapped to the same range of values.
Standardization, on the other hand, does have many useful properties, but can't ensure that the features are mapped to the same range. While standardization may be best suited for other classifiers, this is not the case for k-NN or any other distance-based classifier.