Solved – CNN xavier weight initialization

conv-neural-networkneural networksnormal distributionvariance

In some tutorials I found it was stated that "Xavier" weight initialization (paper: Understanding the difficulty of training deep feedforward neural networks) is an efficient way to initialize the weights of neural networks.

For fully-connected layers there was a rule of thumb in those tutorials:

$$Var(W) = \frac{2}{n_{in} + n_{out}}, \quad \text{simpler alternative:} \quad Var(W) = \frac{1}{n_{in}}$$

where $Var(W)$ is the variance of the weights for a layer, initialized with a normal distribution and $n_{in}$, $n_{out}$ is the amount of neurons in the parent and in the current layer.

Are there similar rules of thumb for convolutional layers?

I am struggling to figure out what would be best to initialize the weights of a convolutional layer. E.g. in a layer where the shape of the weights is (5, 5, 3, 8), so the kernel size is 5x5, filtering three input channels (RGB input) and creating 8 feature maps…would be 3 considered the amount of input neurons? Or rather 75 = 5*5*3, because the input are 5x5 patches for each color channel?

I would accept both, a specific answer clarifying the problem or a more "generic" answer explaining the general process of finding the right initialization of weights and preferably linking sources.

Best Answer

In this case the amount of neurons should be 5*5*3.

I found it especially useful for convolutional layers. Often a uniform distribution over the interval $[-c/(in+out), c/(in+out)]$ works as well.

It is implemented as an option in almost all neural network libraries. Here you can find the source code of Keras's implementation of Xavier Glorot's initialization.

Best Answer

Related Solutions

Solved – How does Krizhevsky’s ’12 CNN get 253,440 neurons in the first layer

Solved – What’s the recommended weight initialization strategy when using the ELU activation function

Related Question