Solved – How to inprove SVHN result with Keras

conv-neural-networkdeep learning

I'm using keras to build to CNN to train the famous SVHN (street view house number) data set(fist version, without cropping). I train the pictures as they all have five digit spot, for pictures have less numbers, I treat the empty spot as another category. So, I have 11 categories for each digit spot. I didn't use bbox yet. I can just get only 30% accuracy best for first two accuracy. I try different dropout rate, or adding one more convolution layer. But accuracy still very low. How can I tune my model to get a better accuracy? I don't to use the bbox at this stage. Is that possible to get a decent perform model?

from keras.models import Sequential
from keras.models import Model
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Input
from keras.layers.core import Dropout
identifier = Sequential()
x = Input((30,70,1))
y = (Convolution2D(64,3,3,activation = "relu",border_mode = "valid"))(x)
y = (MaxPooling2D(pool_size = (2,2)))(y)
y = (Convolution2D(64,3,3,activation = "relu",border_mode = "valid"))(x)
y = (MaxPooling2D(pool_size = (2,2)))(y)
#y = (Dropout(0.5))(y)
y= (Convolution2D(32,3,3,activation = "relu"))(y)
y = (MaxPooling2D(pool_size = (2,2)))(y)
#y = (Dropout(0.5))(y)
y= (Convolution2D(32,3,3,activation = "relu"))(y)
y = (MaxPooling2D(pool_size = (2,2)))(y)
y = (Dropout(0.5))(y)

y = (Flatten())(y)
y =(Dense(output_dim = 512,activation = "relu"))(y)
y = (Dense(output_dim=256,activation = "relu"))(y)
y = (Dense(output_dim=256,activation = "relu"))(y)
y = (Dense(output_dim=128,activation = "relu"))(y)
digit1 = (Dense(output_dim =11,activation = "softmax"))(y)
digit2 = (Dense(output_dim =11,activation = "softmax"))(y)
digit3 = (Dense(output_dim =11,activation = "softmax"))(y)
digit4 = (Dense(output_dim =11,activation = "softmax"))(y)
digit5 = (Dense(output_dim =11,activation = "softmax"))(y)
identifier = Model(input =x, output = [digit1,digit2,digit3,digit4,digit5])
identifier.compile(optimizer = "adam", loss = "categorical_crossentropy",metrics = ["accuracy"])
nb_epoch =3
identifier.fit(training_data.reshape(29329,30,70,1),train_labels,batch_size= 64,nb_epoch= nb_epoch,
               verbose= 1,validation_data=(test_data.reshape(600,30,70,1),test_labels))

Best Answer

Follow the famous SVHN paper (https://arxiv.org/abs/1312.6082). He had great ideas how to improve the classification.

In this paper, they introduce a convolutional layer, composed by a convolution, like you created, a batch normalization, a activation and a maxpooling.

They added 11 convolutional layers in the architecture. The convolution has a 5x5 filter and they use sometimes a stride of 2 in the maxpooling and sometimes a stride of 1.

The number of filters is much bigger too, starting with 48 filters in the first layer and finishing with 192 filters in the end.

The convolutional layer should look like this:

def svhn_layer(model, filters, strides, name):

    model.add(Conv2D(filters, (kernel_size, kernel_size), 
           padding='same', name='conv2d_' + name))

    model.add(BatchNormalization())
    model.add(Activation('relu'))

    model.add(MaxPooling2D(pool_size=(2, 2), strides=(strides, strides),
              name='maxpool_2d_' + name))

    model.add(Dropout(0.2))

    return model

where model is a keras Sequential.

After defining this layer, you can play with many combinations, varying filters and strides. Considering the paper, the network should look something like this:

model = Sequential()

svhn_layer(model, 48, 2, 'hidden1')
svhn_layer(model, 48, 1, 'hidden2')
svhn_layer(model, 48, 1, 'hidden3')

svhn_layer(model, 64, 2, 'hidden4')
svhn_layer(model, 64, 1, 'hidden5')
svhn_layer(model, 64, 1, 'hidden6')

svhn_layer(model, 128, 2, 'hidden7')
svhn_layer(model, 128, 1, 'hidden8')
svhn_layer(model, 128, 1, 'hidden9')

svhn_layer(model, 192, 2, 'hidden10')
svhn_layer(model, 192, 1, 'hidden11')

model.add(Flatten())

model.add(Dense(3072))
model.add(Dense(3072))

x = Input((30,70,1))
y = model(x)

digit1 = (Dense(output_dim =11,activation = "softmax"))(y)
digit2 = (Dense(output_dim =11,activation = "softmax"))(y)
digit3 = (Dense(output_dim =11,activation = "softmax"))(y)
digit4 = (Dense(output_dim =11,activation = "softmax"))(y)
digit5 = (Dense(output_dim =11,activation = "softmax"))(y)

identifier = Model(input=x, output = [digit1, digit2, digit3, digit4, digit5])

You can play with other normalizations too, like selu and elu. My tests show they perform better than relu.

Remembering this will work with inputs like they described (images with dimension 64x64x3). Your input looks like different, so, maybe you have to remove some layers or strides.

Related Question