Solved – What’s a better classifier for simple A-Z letter OCR: SVMs or kNN

k nearest neighbourmachine learningsupervised learningsvm

Disclaimer:

I'm nearing the end introductory machine course so knowledge on the subject is not too strong (yet)!

Context:

I'm thinking of building an optical word search solver for a term project for the said class. The user will draw a box around the character matrix and then it's the job of the program to classify the characters.

This question is only concerned about classifying the characters of the character matrix of a word search. Assume the characters of the matrix are separated into their own images and will be classified individually. This means:

  • there are exactly 26 classes–the letters from A to Z
  • each character image is from a font meaning nothing is hand written

Data:

I'll be generating data straight from fonts and saving each character as a bitmap with only black and white pixels. These images are the ones that will be used to classify the characters of the word search.

Now Here's the Question:

What's a better classifier for this task?

In particular, I'm having trouble choosing between two different classifiers:

  • k-Nearest Neighbors (KNN)
  • Support Vector Machines (SVMs)

It may be naive to consider KNN but I do know that KNN performs well when there are a lot of examples and low dimensionality. It seems easy to get generate a bunch of examples of different fonts, and I can even classify the kind of font before classifying the letter to narrow the dataset first (serif, sans then lowercase, uppercase). My concern here lies with coming up with a transformation to a lower dimensional space.

On the other hand, I've heard SVMs perform much better in higher dimensional space and are less susceptible to outliers. I don't understand SVMs to the extent of KNN so any input on this would be helpful (Unlike KNN, I haven't implemented SVMs before).

Hmm… I guess the real question is: Can I get away with KNN instead of SVMs for OCR in word searches? What performance (error %) should I expect?

One last note: computational time matters in how I'll be separating the characters of the word search into different images. All I really need to know concerning this is what method is less computationally intensive given the way I'll be generating examples.

Thank you so much for any help!

Best Answer

The answer to this question is rather long, I'll try to keep it short then.
The major differences between KNN and SVMs are:

  • SVMs needs a proper training phase whereas KNN directly classifies points thanks to a given distance metric
  • in SVMs the optimum is guaranteed due to the fact that behind the training phase there is an optimization problem involved: indeed, SVMs aim at separating the classes with the optimal hyperplane. KNN, as instead, is quasi-optimal.
  • unlike KNN (which can work for any given number of classes), standard SVMs are binary classifiers (the hyperplane separates two classes). In order to construct a multiclass SVM environment you have to use the One-vs-One approach or the One-vs-All approach. In many toolboxes such as LibSVM, multiclass SVMs are implemented as well. Just for the records:
    1. in the One-vs-All approach you must train as many SVMs as there are classes (the ith SVM will see the ith class as positive and the others as negatives), then you feed the unknown pattern to the whole ensemble and the final outcome is assigned to the SVM who has the largest decision value amongst all the SVMs. You can see the decision value as a function of the margin from the hyperplane: the higher the margin, the more confident the prediction.
    2. in the One-vs-One approach you must train N*(N-1)/2 SVMs: one SVM for every pair of classes. As above, you feed the unknown pattern to the ensemble and the final outcome is assigned thanks to a majority vote amongst all the outcomes from all the SVMs. Most toolboxes implement this approach when it comes to a multiclass classification.

So as you can see the SVMs look more cumbersome when it comes to computational time. However, keep in mind that such training can easily be done once. Then you can save the trained model(s) and just perform predictions using these model(s). As instead, the KNN must evaluate all the distances and select the K neighbours for every new pattern.

I strongly discourage you to implement your own SVMs. The training algorithms are rather complex if you're not expert in programming since (again) there is an optimization problem involved. I don't know in which language you're planning to code but Matlab has its own SVM library, OpenCV has its own library but (in my opinion) the best is LibSVM: it's free, it's cross-plattform and cross-language and it's fast. I always use it.

Another important aspect of the SVMs is the kernel. It is clear that in KNN you just need to define a distance metric. In SVMs you might have two major cases:

  • classes are linearly separable, in this case you don't have to do nothing since the hyperplane will be evaluated thanks to the dot product between patterns.
  • classes are not linearly separable, in this case you must think of a kernel function that maps such patterns in a different (sometimes higher dimension) space in which then classes will be linearly separable. In this case you must then select the appropriate kernel functions: common kernels are polynomials and Gaussian Radial Basis Function.

So in KNN you must only tune the K parameter and select an appropriate distance metric whereas in SVMs you must select a C parameter (which is a regularization term) and eventually the parameters for the kernel in case your classes are not linearly separable (in the polynomial kernel you must specify the polynomial degree and its coefficients whereas in the Gaussian RBF kernel you must specify the shape parameter - i.e. standard deviation).

My suggestion is try to use KNN (simply because it's easier since you have never used SVMs before) even though it's rather impossible to predict the error rate (%) as requested due to the fact that there are several things involved:

  • value for K
  • distance metric
  • how your letters are coded (are they vectors? matrices?...) and how big they are (font size also matters!)

But it would be a nice experiment (if you have time, I don't know about any deadlines for this project) to see how things are different when using SVMs. Maybe have a look at LibSVM and initially try using the linearly separable case (i.e. the kernel function is simply a dot product) and then try to experiment with more sophisticated kernels.

In conclusion, even though SVMs look tough, they provide an optimal solution. Both SVMs and KNN are widely used then it comes to OCR but several experiments proved SVMs the best (with a minor difference in terms of accuracy, something like 2% higher then KNN). You can see this article and the above mentioned OpenCV provides also examples for OCR classification in the KNN case and the SVM case.