Solved – heuristic for determining the size of a fully connected layer at the end of a CNN

artificial intelligenceconv-neural-networkdeep learningmachine learningneural networks

For example, in VGG/OxfordNet, the fully connected (dense) layers that precede the final classification layer are of size 4096.

enter image description here

Similarly, in an AlexNet

… the number of neurons in the network’s remaining layers is given by
253,440–186,624–64,896–64,896–43,264–4096–4096–1000.

My question is what is the rationale for using this particular number of neurons in the dense layers?

I haven't found a satisfactory explanation, given that each of the above examples use different resolution images and different number of kernels and kernel sizes, thereby generating different number of parameters—but regardless use similar dense layers.

Even if this is purely empirical, I'd be interested in hearing your thoughts on how this is decided.

Best Answer

Choosing a network architecture is a bit of a "black art".

They might have tried multiple different parameters and chose one that worked well (evaluating each using cross-validation). Also, you can inform your choice by what has been reported in the research literature to work well on similar tasks, and use that as a starting point for experimentation.

One consideration here is the number of weights that can be set independently: the more of those you have, the greater the risk of overfitting, and the greater the training time. So, increasing this number makes training take longer and runs a higher risk of overfitting, but potentially increases the expressiveness of your neural network. You probably want the number to be as small as possible, without sacrificing accuracy. So, you might try something small and increase it until you stop getting improvements in accuracy (measuring using cross-validation).