Neural Network – How to Train to Distinguish Between Even and Odd Numbers

categorical dataclassificationgenetic algorithmsmachine learningneural networks

Question: is it possible to train a NN to distinguish between odd and even numbers only using as input the numbers themselves?

I have the following dataset:

Number Target
1      0
2      1
3      0
4      1
5      0
6      1
...   ...
99     0
100    1

I trained a NN with two input neurons (one being the variable Number, the other being a bias neuron), nine neurons in the hidden layer and one output neuron using a very simple genetic algorithm: at each epoch, two sets of weights "fight" against each other; the one with the highest error loses and it's replaced by a modified version of the winner.

The script easily solve simple problems like the AND, the OR and the XOR operators but get stuck while trying to categorise odd and even numbers. Right now the best it managed to do is to identify 53 numbers out of 100 and it took several hours. Whether I normalize or not the inputs seems to make no difference.

If I wanted to cheat I could just pre-processed the data and feed % 2 to the NN as an input but I don't want to do that; NN should be able to approximate every function, including the modulo operator (I believe). What am I doing wrong?

Best Answer

As with any machine learning task, the representation of your input plays a crucial role in how well you learn and generalise.

I think, the problem with the representation is that the function (modulo) is highly non-linear and not smooth in the input representation you've chosen for this problem.

I would try the following:

Try a better learning algorithm (back-propagation/gradient descent and its variants).
Try representing the numbers in binary using a fixed length precision.
If your input representation is a b-bit number, I would ensure your training set isn't biased towards small or large numbers. Have numbers that are uniformly, and independently chosen at random from the range $[0, 2^b-1]$.
As you've done, use a multi-layer network (try 2 layers first: i.e., hidden+output, before using more layers).
Use a separate training+test set. Don't evaluate your performance on the training set.

Related Solutions

Solved – How to incorporate the biases in the feed-forward neural network

Anyone new to NN may feel confused when first read NN tutorials with different notations. Some tutorials use 'biases', while others use 'bias units'. The ideas about the role of bias are just the same, which is well illustrated in this question, but the two notations are based on a slight implementation difference I think. The following two are for the same network with the same input layer and first hidden layer.

Implementation for 'biases':
The input layer with $m$ units is represented by a $1\times m$ matrix, $v$ here; the hidden layer with $n$ units is represented by a $1\times n$ matrix, $h$; the weights from the input to the hidden layer is represented by a $m\times n$ weight matrix, $w$; the bias to the hidden layer is represented by an another $1\times n$ matrix, $b$. A forward pass is carried out by $h = v * w + b$ and then apply activation function to $h$.

Implementation for 'bias units':
The input layer with $1\times (m+1)$ units is represented by a $1\times (m+1)$ matrix $v$, and the first unit is a bias unit with constant value $1$; the weight matrix from the input to the hidden layer is of size $(m+1) \times n$, and the first row's values are weights corresponding to the bias; the hidden layer has $n+1$ units in which the first unit is a bias unit with constant value $1$ not affected by forward passes. A forward pass is carried out by $h=v*w$ and then apply activation function to $h$.

The following image quoted from holehouse.org is an illustration of the second implementation.

Both of the two implementations are common, so deal with the question based on the notation. According to the given conditions, your question follows the first implementation. Suppose your v is a one unit vector [2.8], the following is an R implementation of the forward pass.

logistic <- function(vec){
  size = length(vec);
  for(i in 1:size){
    vec[i] = 1 / (1 + exp(-vec[i]));
  }
  return (vec);
}

v = c(2.8)
w = c(0.12,0.86,0.20,0.5)
b = c(7.12,-6.20,0.90,-3.6)
result = logistic(v%*%t(w) + b)
result
      [,1]       [,2]      [,3]       [,4]
[1,] 0.9994224 0.02205315 0.8115327 0.09975049

Besides, if it is the second implementation, the input layer becomes [1, 2.8], the biases are merged to the weight matrix, which becomes [7.12,−6.20,0.90,−3.6; 0.12,0.86,0.20,0.5], and the hidden layer has a bias unit.

v = c(1,2.8)
w = matrix (nrow = 2, ncol = 4)
w[1, ] = c(7.12,-6.20,0.90,-3.6);
w[2, ] = c(0.12,0.86,0.20,0.5);
result = logistic(v%*%w)
result
      [,1]       [,2]      [,3]       [,4]
[1,] 0.9994224 0.02205315 0.8115327 0.09975049
h = c(1, result);
h
[1] 1.00000000 0.99942237 0.02205315 0.81153267 0.09975049

Neural Networks – Is It Normal That a Neural Network Sometimes Doesn’t Learn XOR?

Yes.

There are 16 local minimums that have the highest conversion if the weights are initialized between 0.5 and 1.

Image source: Yoshio Hirose, Koichi Yamashita, Shimpei Hijiya, "Back-propagation algorithm which varies the number of hidden units," Neural Networks, Volume 4, Issue 1, (1991)

A similar question including an implementation in tflearn

Best Answer

Related Solutions

Solved – How to incorporate the biases in the feed-forward neural network

Neural Networks – Is It Normal That a Neural Network Sometimes Doesn’t Learn XOR?

Related Question