Solved – KNN outperforms CNN

conv-neural-networkk nearest neighbouroptical character recognition

Disclaimer: I am a programmer by trade, not a statistician, so please cater to my ignorance when explaining things and I apologize now if I make any incorrect assumptions

Please consider the following problem:

I am currently attempting to build an OCR platform for printed characters moving at speed in a video stream. I am able to detect and segment the images like so:

enter image description here enter image description here enter image description here

These are labeled using a standard [0,0,1,0,0,0,0,0,0,0] format.

I first attempted to build a convolution neural network using keras for performing the task of recognition with the following architecture:

# First convolution layer
model = Sequential()
model.add(Convolution2D(20, 15, 15, border_mode="same",input_shape=(height, width, depth)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))

# Second convolution layer
model.add(Convolution2D(50, 15, 15, border_mode="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))

# Third convolution layer
model.add(Convolution2D(120, 15, 15, border_mode="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))

# Fully connected layer
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))

# Classifier
model.add(Dense(classes))
model.add(Activation("softmax"))

opt = SGD(lr=0.01)
model.compile(loss="categorical_crossentropy", optimizer=opt,metrics=["accuracy"])
history = model.fit(trainingData, trainingLabels, batch_size=128, epochs=150,verbose=1)

However it would appear the network converges after only a few epochs with an awful accuracy level, then stays at that level indefinitely.

I have attempted tweaking the learning rate, amount of layers, size/amount of filters but still have the same results.

At first I assumed it was down to the validity of my training data, however after training a KNN classifier on the same data it achieves 94.87% accuracy.

I originally followed this fantastic tutorial for building the architecture as it solves a similar problem (MNIST dataset)

I was hoping to use a CNN as a learning exercise into why CNN's work so well for this kind of problem, any assistance in understanding why my CNN didn't work would be greatly appreciated.

Best Answer

Almost certainly the low performance of your CNN is due to insufficient data.

A quick double-check in Keras using model.count_params() says your network has more than 10 million parameters -- which is not too much by modern standards but is quite a big bunch if you only have 1.5k images. Conventional wisdom in ML says that you should have at least a handful of thousands of images per class if you want to consider deep learning -- although in my experience I'd say it has to be quite a bit more unless you're willing to spend a long while fine tuning your model.

If you want to go the neural net way, I would suggest you to make your network smaller and add some strong regularisation, potentially through heavy dropout or L2 regularisation. If you're serious about this you can even consider doing some data augmentation or transfer learning (potentially from MNIST).

If you're just hacking around some ML for fun, I would recommend you to look into other classifiers that are more likely to work in your scenario. A couple of examples are Support Vector Machines and Random Forests.

Related Question