For the main question:
Does class balancing introduce bias?
Yes, in most cases it does. Since the new data points are generated from the old ones, they can't introduce much variance to the dataset. In most cases they are only slightly different than the original ones.
Does oversampling before spliting introduce bias?
Yes, and this is why you should perform the splitting before balancing the training set. You want your test set to be as unbiased as possible in order to get an objective evaluation of the model's performance. If balancing was performed before splitting the datasets, the model might have seen information on the test set, during training, through the generated data points.
Is it more scientifically correct to oversample after spliting the training and test set individually?
You shouldn't over-sample the test set. The test set should be as objective as possible. By generating new test set data and evaluating your model on those, the procedure would lose its objectivity.
Do we have to balance the test set?
No, you shouldn't under any condition balance the test set.
Could ENN and/or SMOTE introduce bias for specific classifiers?
I don't think that k-NN or any other specific classifier would be more biased to the test set than the others. I'm not sure about this, though.
Best Answer
What I am going to write is partly described in some of the posts mentioned in the comments. I thought How to deal with a skewed class in binary classification having many features? is close to what I want to say.
I think that in general, class balancing and oversampling will not improve overall accuracy, but that is not the goal. As described in the cited post, with strong class imbalance you can get very high accuracy by simply saying everything is majority class. What I would like to emphasize is that getting the highest accuracy is not always the goal. It is often better to make more false positive errors in return for eliminating some of the false negatives. Many diseases have a fairly low incidence rate. But simply saying that no one has the disease is not an acceptable solution. If you identify most of the true positives (and include some false positives) additional testing can be applied to a small group to sort out which cases were real and which were not. The overall accuracy is lower, but you identify more of the cases that it is critical to identify.