Solved – Why do we have normally more than one fully connected layers in the late steps of the CNNs

conv-neural-networkdeep learningimage processingmachine learningneural networks

As I noticed, in many popular architectures of the convolutional neural networks (e.g. AlexNet), people use more than one fully connected layers with almost the same dimension to gather the responses to previously detected features in the early layers.

Why do not we use just one FC for that? Why this hierarchical arrangement of the fully connected layers is possibly more useful?

enter image description here

Best Answer

Why do not we use just one FC for that? Why this hierarchical arrangement of the fully connected layers is possibly more useful?

For the same reason as why two-layer fully connected feedforward neural networks may perform better than single-layer fully connected feedforward neural networks: it increases the capacity of the network, which may help or not.

Note that the last fully connected feedforward layers you pointed to contain most of the parameters of the neural network:

enter image description here

(source)

The number of last fully connected feedforward layers is empirically chosen. Sometimes having only one is good enough, e.g. GoogLeNet:

enter image description here