Solved – How to incorporate the biases in the feed-forward neural network

biasmachine learningneural networks

I'm trying to implement a FFNN. I'm doing this as an excercise to understand how biases play a role in the classification. I trained a NN using a package in R with the inputs being 1..100 and the labels being their square root in the range 1..10. The network topology is 1:10:1. I have all the weights and biases exported.

If I implement the FF pass as I think I have to, I always get a value between 0 and 1. Which is very logical, because that's what the sigmoid function does. But in R, I get the correct answers for the root-able numbers being in the range of 1..10. So I'm thinking I'm implementing the biases wrong.

For example, given a NN with the topology 1:4:1:

If I have a weight vector $w$ with the weights $[0.12, 0.86, 0.20, 0.5]$ from my input neurons to the hidden layer, and I have the corresponding biases $b$ with values $[7.12, -6.20, 0.90, -3.6]$. What would my feed-forward pass look like from the input layer, through the hidden layer? I would appreciate a vectorized implementation, because my eventual implementation will have very large matrices.

Best Answer

Anyone new to NN may feel confused when first read NN tutorials with different notations. Some tutorials use 'biases', while others use 'bias units'. The ideas about the role of bias are just the same, which is well illustrated in this question, but the two notations are based on a slight implementation difference I think. The following two are for the same network with the same input layer and first hidden layer.

Implementation for 'biases':
The input layer with $m$ units is represented by a $1\times m$ matrix, $v$ here; the hidden layer with $n$ units is represented by a $1\times n$ matrix, $h$; the weights from the input to the hidden layer is represented by a $m\times n$ weight matrix, $w$; the bias to the hidden layer is represented by an another $1\times n$ matrix, $b$. A forward pass is carried out by $h = v * w + b$ and then apply activation function to $h$.

Implementation for 'bias units':
The input layer with $1\times (m+1)$ units is represented by a $1\times (m+1)$ matrix $v$, and the first unit is a bias unit with constant value $1$; the weight matrix from the input to the hidden layer is of size $(m+1) \times n$, and the first row's values are weights corresponding to the bias; the hidden layer has $n+1$ units in which the first unit is a bias unit with constant value $1$ not affected by forward passes. A forward pass is carried out by $h=v*w$ and then apply activation function to $h$.

The following image quoted from holehouse.org is an illustration of the second implementation.

Both of the two implementations are common, so deal with the question based on the notation. According to the given conditions, your question follows the first implementation. Suppose your v is a one unit vector [2.8], the following is an R implementation of the forward pass.

logistic <- function(vec){
  size = length(vec);
  for(i in 1:size){
    vec[i] = 1 / (1 + exp(-vec[i]));
  }
  return (vec);
}

v = c(2.8)
w = c(0.12,0.86,0.20,0.5)
b = c(7.12,-6.20,0.90,-3.6)
result = logistic(v%*%t(w) + b)
result
      [,1]       [,2]      [,3]       [,4]
[1,] 0.9994224 0.02205315 0.8115327 0.09975049

Besides, if it is the second implementation, the input layer becomes [1, 2.8], the biases are merged to the weight matrix, which becomes [7.12,−6.20,0.90,−3.6; 0.12,0.86,0.20,0.5], and the hidden layer has a bias unit.

v = c(1,2.8)
w = matrix (nrow = 2, ncol = 4)
w[1, ] = c(7.12,-6.20,0.90,-3.6);
w[2, ] = c(0.12,0.86,0.20,0.5);
result = logistic(v%*%w)
result
      [,1]       [,2]      [,3]       [,4]
[1,] 0.9994224 0.02205315 0.8115327 0.09975049
h = c(1, result);
h
[1] 1.00000000 0.99942237 0.02205315 0.81153267 0.09975049