Solved – Xavier Initialization – Formula Clarification

conv-neural-networkneural networksnormal distributionvariance

Problem : How is a $W_i$ calculated when using Xavier initialization?

From what I understand, the Xavier initialization calculate de sttdev, but Im not sure how it uses that for calculating a specific weight value.

According to the references, $W$ is the "initialization distribution for the neuron in question", what does that mean ? How does that even decide what the value will be?

For a current Layer, let $s$ be the output connections of the layer and $e$ the input connections, then:
$f(W) = \frac{2}{e + s}$

References:

http://philipperemy.github.io/xavier-initialization/

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/

https://www.quora.com/What-is-an-intuitive-explanation-of-the-Xavier-Initialization-for-Deep-Neural-Networks

CNN xavier weight initialization

Best Answer

Neural networks are optimized by starting with an initial, random guess of the parameter values. This guess is iteratively updated, mostly commonly using . Researchers have found that the optimization task can be very challenging, but that careful attention to how the parameters are initialized can make the optimization easier.

In the case of Xavier initialization (also called "Glorot normal" in some software), the parameters are initialized as random draws from a truncated normal distribution with mean 0 and standard deviation $$\sigma = \sqrt{\frac{2}{a+b}}$$ where $a$ is the number of input units in the weight tensor, and $b$ is the number of output units in the weight tensor.