Solved – Bayes Decision Boundary and classifier

classification

Is it correct to say that the purpose of classifier (e.g. K-NN, Logistic Regression, LDA) is to approximate the Bayes Decision boundary?

Best Answer

Yes, Bayes Classifier is the one which produces the lowest possible test error rate. This is I think best illustrated through an example.

To simplify things a bit, let's say we have a simple two class classification problem. For example, we survey a group of students and collect their age, SAT scores and current GPA and want to predict whether they are going to fail a course or not. So in R it would be something like fail ~ age + sat.score + current.GPA

Bayes classifier works by just looking at the probabilities for each combination of the features and assigning each instance to the class which has the probability bigger than 50%.

Imagine that we do survey all the students that exist. Then in that case, the classifier would know correct probabilities for failing or not for all possible combination of features, and then it will give the best possible classification accuracy.

However, this does not mean that it will be able to classify all instances correctly (i.e., to have 0% error rate) as this is impossible in most of the cases.

In our example, it is very likely that some of the students would have the same values for all three features and yet some of them will fail while others won't. There is no single classifier which will give you 100% correct answer to this problem, as there is no way to differentiate between the students who failed or not (for a classifier, they look the same). Adding new features for example, previous knowledge or IQ etc might help, but then the problem definition would changed and that might improve the classification accuracy.

Thus, if for example for a given combination of features 80% of students pass and only 20% fail, well, then the Bayesian classifier will predict that the students with that combination of features will pass the course, as that is more likely.

This minimal possible error rate of the Bayesian classifier is called irreducible error and all classifiers exhibit it. Other classifiers besides this type of error also exhibit reducible error which can be described as resulting from good, but not perfect estimates of those probabilities. Given that most of the times we don't have this perfect information, the idea of different classification models is to make different assumption which will be good enough to produce high enough classification accuracy while not requiring to collect the data about the whole population.

Read the full explanation in the "Introduction to statistical learning" by Hastie and Tibshiani http://www-bcf.usc.edu/~gareth/ISL/ (Page 38)