While implementing Batch Normalization for a particular layer 'L' with 'n' hidden neurons/units in a Neural Network, we first normalize the Activation values of that layer using their respective Mean and Standard Deviation, and then apply the Scaling and Offset factor as shown:
X-norm = (X – mu)/sd
X' = (Y * X-norm) + Bwhere
mu = mean of X and it is a (n,1) vector
sd = standard deviation of X and it is also a (n,1) vector
X = Activation values of layer 'L' with dimension (n,m) if mini-batch
size = m
X-norm = normalized X with dimension (n,m)
Y = Gamma / Scaling factor
B = Beta / Offset factor
Now my question is, what are the dimensions of Gamma and Beta ? Are they (n,1) vectors or are they (n,m) matrices ? My intuition says that since they somewhat are analogous to the Mean and Standard Deviation, they should be (n,1) vectors.
Best Answer
The symbols $\gamma, \beta$ are $n$-vectors because there is a scalar $\gamma^{(k)}, \beta^{(k)}$ parameter for each input $x^{(k)}$.
From the batch norm paper:
Emphasis mine.
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." Sergey Ioffe, Christian Szegedy