Solved – Struggling to make a neural network mimic a basic if statement

I want to make a neural network that can satisfy the following conditions but the neural network would never get close to converging. It was a ReLu neural network with sigmoid on the output

If X < 0.95 output 0 If X > 1.05 output 0 Else output 1

I made a neural network with multiple layers and provided it with the input and output tensors below. The output for all the inputs was just 0.33 for any of the inputs.
I increased to 9 data examples and got the same output.

INPUTS = [[0.95], [1], [1.05]]
OUTPUTS = [[0], [1], [0]]

It would converge if I provided data for any 2 of the 3 if statements.

Is there a fundamental limitations of neural networks to solve this? Or should it be possible and I'm probably doing something wrong?

PS I used Python & TensorFlow. The code is below

import tensorflow as tf
sess = tf.InteractiveSession()
INPUTS_AMOUNT = 1
HIDDEN_NODES_AMOUNT = 10
HIDDEN_NODES_AMOUNT_2 = 10
OUTPUTS_AMOUNT = 1

# define placeholder for input and output
x_ = tf.placeholder(tf.float32, shape=[None, INPUTS_AMOUNT], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[None,OUTPUTS_AMOUNT], name="y-input")

# Since we're using a relu, the weights are initiated appropriately to avoid dead (-ve) neurons
W = tf.Variable(tf.random_uniform([INPUTS_AMOUNT, HIDDEN_NODES_AMOUNT], 0.001, .01))
b = tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT]))
hidden  = tf.nn.relu(tf.matmul(x_,W) + b)

W1 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT, HIDDEN_NODES_AMOUNT_2], 0.001, .01))
b1= tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT_2]))
hidden1  = tf.nn.relu(tf.matmul(hidden,W1) + b1)
W2 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT_2,OUTPUTS_AMOUNT], -1, 1))
b2 = tf.Variable(tf.zeros([OUTPUTS_AMOUNT]))
hidden2 = tf.matmul(hidden1, W2) + b2
y = tf.nn.sigmoid(hidden2)
# Training function allows for error calculations for value between 0 and 1
cost = tf.reduce_mean(( (y_ * tf.log(y)) +
((1 - y_) * tf.log(1.0 - y)) ) * -1)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

# Specify the data to go into the placeholders
INS = [[0.9], [1.0], [1.1]]
OUTS = [ [0], [1], [0]]
init = tf.global_variables_initializer()
sess.run(init)
# Train on the input data, doesn't actually need 100000 to converge
for i in range(100000):
    sess.run(train_step, feed_dict={x_: [INS[i%3]], y_: [OUTS[i%3]]})
    if i % 2000 == 0:
        print('Output for debugging', sess.run(y, feed_dict={x_: INS, y_: OUTS}))

import tensorflow as tf import numpy as np sess = tf.InteractiveSession() INPUTS_AMOUNT = 1 HIDDEN_NODES_AMOUNT = 10 HIDDEN_NODES_AMOUNT_2 = 10 OUTPUTS_AMOUNT = 1 # define placeholder for input and output x_ = tf.placeholder(tf.float32, shape=[None, INPUTS_AMOUNT], name="x-input") y_ = tf.placeholder(tf.float32, shape=[None,OUTPUTS_AMOUNT], name="y-input") # Since we're using a relu, the weights are initiated appropriately to avoid dead (-ve) neurons ### FIX WEIGHTS INITIALISATION W = tf.Variable(tf.random_uniform([INPUTS_AMOUNT, HIDDEN_NODES_AMOUNT], -0.1, 0.1)) b = tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT])) hidden = tf.nn.relu(tf.matmul(x_,W) + b) ### FIX WEIGHTS INITIALISATION W1 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT, HIDDEN_NODES_AMOUNT_2], -0.1, 0.1)) b1= tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT_2])) hidden1 = tf.nn.relu(tf.matmul(hidden,W1) + b1) ### FIX WEIGHTS INITIALISATION W2 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT_2,OUTPUTS_AMOUNT], -0.1, 0.1)) b2 = tf.Variable(tf.zeros([OUTPUTS_AMOUNT])) hidden2 = tf.matmul(hidden1, W2) + b2 y = tf.nn.sigmoid(hidden2) # Training function allows for error calculations for value between 0 and 1 cost = tf.reduce_mean(( (y_ * tf.log(y)) + ((1 - y_) * tf.log(1.0 - y)) ) * -1) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost) # Specify the data to go into the placeholders INS = [[0.9], [1.0], [1.1]] ### Z-SCORE INPUTS INS = np.array(INS) INS = (INS-INS.mean())/INS.std() OUTS = [ [0], [1], [0]] init = tf.global_variables_initializer() sess.run(init) # Train on the input data, doesn't actually need 100000 to converge for i in range(100000): sess.run(train_step, feed_dict={x_: [INS[i%3]], y_: [OUTS[i%3]]}) if i % 2000 == 0: print('Output for debugging', sess.run(y, feed_dict={x_: INS, y_: OUTS}))

Best Answer

The curvature of the cost surface with these particular inputs and outputs makes this a bit of a pathological example. A 'good' solution can be found by just outputting 0.333 all the time, and if you take a small step away from this solution for one of the inputs in the correct direction, it's probably often being cancelled out by a large increase in cost for one of the other inputs.

Still, if you make your weights initialisation more sensible (should be centred at zero) and standardise your inputs, then you can get this to work:

Best Answer

Related Solutions

Solved – Find Neural Network Inputs Given Outputs

Related Question