Solved – CNN xavier weight initialization

conv-neural-networkneural networksnormal distributionvariance

In some tutorials I found it was stated that "Xavier" weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

$$Var(W) = \frac{2}{n_{in} + n_{out}}, \quad \text{simpler alternative:} \quad Var(W) = \frac{1}{n_{in}}$$

where $Var(W)$ is the variance of the weights for a layer, initialized with a normal distribution and $n_{in}$, $n_{out}$ is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is (5, 5, 3, 8), so the kernel size is 5x5, filtering three input channels (RGB input) and creating 8 feature maps…would be 3 considered the amount of input neurons? Or rather 75 = 5*5*3, because the input are 5x5 patches for each color channel?

I would accept both, a specific answer clarifying the problem or a more "generic" answer explaining the general process of finding the right initialization of weights and preferably linking sources.

Best Answer

In this case the amount of neurons should be 5*5*3.

I found it especially useful for convolutional layers. Often a uniform distribution over the interval $[-c/(in+out), c/(in+out)]$ works as well.

It is implemented as an option in almost all neural network libraries. Here you can find the source code of Keras's implementation of Xavier Glorot's initialization.