Solved – What exactly is a Residual Learning block in the context of Deep Residual Networks in Deep Learning

conv-neural-networkdeep learningmachine learningneural networksresidual-networks

I was reading the paper Deep Residual Learning for Image Recognition and I had difficulties understanding with 100% certainty what a residual block entails computationally. Reading their paper they have figure 2:

which illustrates what a Residual Block is suppose to be. Is the computation of a residual block simply the same as:

$$ \mathbf{y} = \sigma( W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 + \mathbf{x} )$$

Or is it something else?

In other words maybe to try to match the paper's notation, is:

$$ \mathcal F(x) + x = \left[ W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 \right] + \mathbf{x}$$

is that true?

Notice that after the circle summation, the word ReLU appears on the paper, so the output of a Residual Block (which I denoted by $\mathbf{y}$) should be:

$$ \sigma( \mathcal F(x) + x ) = \sigma( \left[ W_2 \sigma( W_1 \mathbf{x} + b_1 ) + b_2 \right] + \mathbf{x} )$$

with one additional ReLU non-linearity $\sigma$.

Solved – What exactly is a Residual Learning block in the context of Deep Residual Networks in Deep Learning

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule

Transformers – Why Residual Connections are Needed in Neural Networks

Related Question