Solved – What exactly is a Residual Learning block in the context of Deep Residual Networks in Deep Learning

conv-neural-networkdeep learningmachine learningneural networksresidual-networks

I was reading the paper Deep Residual Learning for Image Recognition and I had difficulties understanding with 100% certainty what a residual block entails computationally. Reading their paper they have figure 2:

enter image description here

which illustrates what a Residual Block is suppose to be. Is the computation of a residual block simply the same as:

$$ \mathbf{y} = \sigma( W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 + \mathbf{x} )$$

Or is it something else?

In other words maybe to try to match the paper's notation, is:

$$ \mathcal F(x) + x = \left[ W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 \right] + \mathbf{x}$$

is that true?

Notice that after the circle summation, the word ReLU appears on the paper, so the output of a Residual Block (which I denoted by $\mathbf{y}$) should be:

$$ \sigma( \mathcal F(x) + x ) = \sigma( \left[ W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 \right] + \mathbf{x} )$$

with one additional ReLU non-linearity $\sigma$.

Best Answer

Yes that's true, you can take a look at their caffe model to see how it is implemented.

Related Question