Solved – Struggling to make a neural network mimic a basic if statement

artificial intelligencemachine learningneural networkstensorflow

I want to make a neural network that can satisfy the following conditions but the neural network would never get close to converging. It was a ReLu neural network with sigmoid on the output

If X < 0.95 output 0
If X > 1.05 output 0
Else output 1

I made a neural network with multiple layers and provided it with the input and output tensors below. The output for all the inputs was just 0.33 for any of the inputs.
I increased to 9 data examples and got the same output.

INPUTS = [[0.95], [1], [1.05]]
OUTPUTS = [[0], [1], [0]]

It would converge if I provided data for any 2 of the 3 if statements.

Is there a fundamental limitations of neural networks to solve this? Or should it be possible and I'm probably doing something wrong?

PS I used Python & TensorFlow. The code is below

import tensorflow as tf
sess = tf.InteractiveSession()
INPUTS_AMOUNT = 1
HIDDEN_NODES_AMOUNT = 10
HIDDEN_NODES_AMOUNT_2 = 10
OUTPUTS_AMOUNT = 1

# define placeholder for input and output
x_ = tf.placeholder(tf.float32, shape=[None, INPUTS_AMOUNT], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[None,OUTPUTS_AMOUNT], name="y-input")

# Since we're using a relu, the weights are initiated appropriately to avoid dead (-ve) neurons
W = tf.Variable(tf.random_uniform([INPUTS_AMOUNT, HIDDEN_NODES_AMOUNT], 0.001, .01))
b = tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT]))
hidden  = tf.nn.relu(tf.matmul(x_,W) + b)

W1 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT, HIDDEN_NODES_AMOUNT_2], 0.001, .01))
b1= tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT_2]))
hidden1  = tf.nn.relu(tf.matmul(hidden,W1) + b1)
W2 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT_2,OUTPUTS_AMOUNT], -1, 1))
b2 = tf.Variable(tf.zeros([OUTPUTS_AMOUNT]))
hidden2 = tf.matmul(hidden1, W2) + b2
y = tf.nn.sigmoid(hidden2)
# Training function allows for error calculations for value between 0 and 1
cost = tf.reduce_mean(( (y_ * tf.log(y)) +
((1 - y_) * tf.log(1.0 - y)) ) * -1)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

# Specify the data to go into the placeholders
INS = [[0.9], [1.0], [1.1]]
OUTS = [ [0], [1], [0]]
init = tf.global_variables_initializer()
sess.run(init)
# Train on the input data, doesn't actually need 100000 to converge
for i in range(100000):
    sess.run(train_step, feed_dict={x_: [INS[i%3]], y_: [OUTS[i%3]]})
    if i % 2000 == 0:
        print('Output for debugging', sess.run(y, feed_dict={x_: INS, y_: OUTS}))

Best Answer

The curvature of the cost surface with these particular inputs and outputs makes this a bit of a pathological example. A 'good' solution can be found by just outputting 0.333 all the time, and if you take a small step away from this solution for one of the inputs in the correct direction, it's probably often being cancelled out by a large increase in cost for one of the other inputs.

Still, if you make your weights initialisation more sensible (should be centred at zero) and standardise your inputs, then you can get this to work:

import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
INPUTS_AMOUNT = 1
HIDDEN_NODES_AMOUNT = 10
HIDDEN_NODES_AMOUNT_2 = 10
OUTPUTS_AMOUNT = 1

# define placeholder for input and output
x_ = tf.placeholder(tf.float32, shape=[None, INPUTS_AMOUNT], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[None,OUTPUTS_AMOUNT], name="y-input")

# Since we're using a relu, the weights are initiated appropriately to avoid dead (-ve) neurons
### FIX WEIGHTS INITIALISATION
W = tf.Variable(tf.random_uniform([INPUTS_AMOUNT, HIDDEN_NODES_AMOUNT], -0.1, 0.1))
b = tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT]))
hidden  = tf.nn.relu(tf.matmul(x_,W) + b)

### FIX WEIGHTS INITIALISATION
W1 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT, HIDDEN_NODES_AMOUNT_2], -0.1, 0.1))
b1= tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT_2]))
hidden1  = tf.nn.relu(tf.matmul(hidden,W1) + b1)
### FIX WEIGHTS INITIALISATION
W2 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT_2,OUTPUTS_AMOUNT], -0.1, 0.1))
b2 = tf.Variable(tf.zeros([OUTPUTS_AMOUNT]))
hidden2 = tf.matmul(hidden1, W2) + b2
y = tf.nn.sigmoid(hidden2)
# Training function allows for error calculations for value between 0 and 1
cost = tf.reduce_mean(( (y_ * tf.log(y)) + ((1 - y_) * tf.log(1.0 - y)) ) * -1)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

# Specify the data to go into the placeholders
INS = [[0.9], [1.0], [1.1]]
### Z-SCORE INPUTS
INS = np.array(INS)
INS = (INS-INS.mean())/INS.std()
OUTS = [ [0], [1], [0]]
init = tf.global_variables_initializer()
sess.run(init)
# Train on the input data, doesn't actually need 100000 to converge
for i in range(100000):
    sess.run(train_step, feed_dict={x_: [INS[i%3]], y_: [OUTS[i%3]]})
    if i % 2000 == 0:
        print('Output for debugging', sess.run(y, feed_dict={x_: INS, y_: OUTS}))
Related Question