Solved – backpropagation – bias nodes and error

backpropagationmachine learningneural networks

I am implementing the stochastic gradient descent version of backpropagation from Tom Mitchell's Machine Learning book which has the steps for each training instance $\langle\vec{x},\vec{t}\rangle$:

Input instance $\vec{x}$ and compute output $o_u$ for every unit $u$.
For each output unit $k$, compute error $\delta_k = o_k(1-o_k)(t_k-o_k)$
For each hidden unit $h$, compute error $\delta_h = o_h(1-o_h)\sum_{k \in outputs}(w_{kh}\delta_k)$
Update each weight $w_{ji} = w_{ji} + \eta\delta_j x_{ji}$

I would like bias units at both the input and hidden layers. Are the bias units treated like any other units, and specifically, do the bias units have $\delta$ error values associated with them? If I am in Matlab and implementing with matrices, would I simply concatenate a bias to $\vec{x}$ and to the outputs vector for the hidden layer?

Best Answer

For simplicity, bias units are subsumed into the equation by extending the input vector adding a component which is always 1. Concretely,

$$ x = (x_{1}, ..., x_{n},1) $$ so that the activation for each unit can then be rewritten as, $$ a_{i} = \sum_{j=1}^{N} w_{ij}x_{j} + w_{i0} = \sum_{j=0}^{N} w_{ij}x_{j} $$

You can see a detailed derivation of the backpropagation rule in the paper neural networks and their applications.

Best Answer

Related Solutions

Solved – Back-propagation in Neural Nets with >2 hidden layers

Neural Networks – Is It Normal That a Neural Network Sometimes Doesn’t Learn XOR?

Related Question