Does anyone know how the accuracy of CNNs compare with fully connected networks for image recognition? Also are CNNs good at anything other than image recognition? I couldn't find anything on Google, a link or explanation would be good.
Solved – CNN vs fully connected network for image recognition
neural networks
Related Solutions
One reason might be due to the mathematical convenience. The vanilla recurrent neural network (Elman-type) can be formularized as:
$\vec{h}_t = f(\vec{x}_t, \vec{h}_{t-1})$, where $f(\cdot)$ can be written as $\sigma(W\vec{x}_t + U\vec{h}_{t-1})$.
The above equation corresponds to your first picture. Of course you can make the recurrent matrix $U$ sparse to restrict the connections, but that does not affect the core idea of the RNN.
BTW: There exists two kinds of memories in a RNN. One is the input-to-hidden weight $W$, which mainly stores the information from the input. The other is the hidden-to-hidden matrix $U$, which is used to store the histories. Since we do not know which parts of histories will affect our current prediction, the most reasonable way might be to allow all possible connetions and let the network learn for themself.
I did some experimenting with Keras' MNIST tutorial.
If I edit the model to be fully convolutional, then train it, I encounter the same problem.
If I instead train the model as written, save the weights, and then import them to a convolutionalized model (reshaping where appropriate), it tests as perfectly equivalent. However, training it further causes accuracy to drop drastically.
So changing the network to be fully convolutional changes the gradient in some way, such that the network no longer converges at an optimum. This page claims that there is some way to train a network as fully convolutional from the start, but does not say how. Possibly it involves the use of a different loss function.
For those interested, my code for convolutionalizing the MNIST tutorial and reimporting the weights is below.
from __future__ import print_function
import keras
from keras.utils import plot_model
from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv1D, Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np
*same as tutorial*
weights = load_model('CNN.h5').get_weights()
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape,
padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu',
padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
#model.add(Flatten())
#model.add(Dense(128, activation='relu'))
model.add(Conv2D(128, (14,14), activation='relu', padding='valid'))
model.add(Dropout(0.5))
#model.add(Dense(num_classes, activation='softmax'))
model.add(Conv2D(num_classes, (1,1), activation='softmax'))
model.add(Flatten())
plot_model(model, 'model.png', show_shapes=True)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.layers[0].set_weights([weights[0], weights[1]])
model.layers[1].set_weights([weights[2], weights[3]])
model.layers[4].set_weights([weights[4].reshape([14,14,64,128]), weights[5]])
model.layers[6].set_weights([weights[6].reshape([1,1,128,num_classes]), weights[7]])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
verbose=1, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
#model.save('CNN.h5')
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Best Answer
Fully connected neural networks are good enough classifiers, however they aren't good for feature extraction. Before the emergence on CNNs the state-of-the-art was to extract explicit features from images and then classify these features.
CNNs are trained to identify and extract the best features from the images for the problem at hand. That is their main strength. The latter layers of a CNN are fully connected because of their strength as a classifier. So these two architectures aren't competing though as you may think as CNNs incorporate FC layers.
If your question was how well a FC-based image recognition technique fairs compared to a CNN one, you should check the results of the ILSVRC for the past years. The last non-CNN architecture I think achieved a top 5 error rate of 30% (today with the state-of-the-art CNNs this is under 3%).