The curvature of the cost surface with these particular inputs and outputs makes this a bit of a pathological example. A 'good' solution can be found by just outputting 0.333 all the time, and if you take a small step away from this solution for one of the inputs in the correct direction, it's probably often being cancelled out by a large increase in cost for one of the other inputs.
Still, if you make your weights initialisation more sensible (should be centred at zero) and standardise your inputs, then you can get this to work:
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
INPUTS_AMOUNT = 1
HIDDEN_NODES_AMOUNT = 10
HIDDEN_NODES_AMOUNT_2 = 10
OUTPUTS_AMOUNT = 1
# define placeholder for input and output
x_ = tf.placeholder(tf.float32, shape=[None, INPUTS_AMOUNT], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[None,OUTPUTS_AMOUNT], name="y-input")
# Since we're using a relu, the weights are initiated appropriately to avoid dead (-ve) neurons
### FIX WEIGHTS INITIALISATION
W = tf.Variable(tf.random_uniform([INPUTS_AMOUNT, HIDDEN_NODES_AMOUNT], -0.1, 0.1))
b = tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT]))
hidden = tf.nn.relu(tf.matmul(x_,W) + b)
### FIX WEIGHTS INITIALISATION
W1 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT, HIDDEN_NODES_AMOUNT_2], -0.1, 0.1))
b1= tf.Variable(tf.zeros([HIDDEN_NODES_AMOUNT_2]))
hidden1 = tf.nn.relu(tf.matmul(hidden,W1) + b1)
### FIX WEIGHTS INITIALISATION
W2 = tf.Variable(tf.random_uniform([HIDDEN_NODES_AMOUNT_2,OUTPUTS_AMOUNT], -0.1, 0.1))
b2 = tf.Variable(tf.zeros([OUTPUTS_AMOUNT]))
hidden2 = tf.matmul(hidden1, W2) + b2
y = tf.nn.sigmoid(hidden2)
# Training function allows for error calculations for value between 0 and 1
cost = tf.reduce_mean(( (y_ * tf.log(y)) + ((1 - y_) * tf.log(1.0 - y)) ) * -1)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
# Specify the data to go into the placeholders
INS = [[0.9], [1.0], [1.1]]
### Z-SCORE INPUTS
INS = np.array(INS)
INS = (INS-INS.mean())/INS.std()
OUTS = [ [0], [1], [0]]
init = tf.global_variables_initializer()
sess.run(init)
# Train on the input data, doesn't actually need 100000 to converge
for i in range(100000):
sess.run(train_step, feed_dict={x_: [INS[i%3]], y_: [OUTS[i%3]]})
if i % 2000 == 0:
print('Output for debugging', sess.run(y, feed_dict={x_: INS, y_: OUTS}))
Without knowing a lot more about the model, nor the data used, it is hard to answer these questions with and rigour. That aside, the values you provide would make the think it is a reasonable model and does not necessarily overfit the training data.
for your second question, my first line of action would always be to plot the training and test accuracy over each epoch (iteration), then look at how the curves develop. I generally hope to see a test curve that shadows the training curve, always a little lower. Here is a diagram with a short explanation taken from the amazing cs231n course from Stanford.
Image source
Course Homepage
All the material and video lectures are freely available and would be a great place for you to improve your understanding whilst working on Deep Learning topics.
Best Answer
A dirt-simple solution is to add a regularization term, so your loss function is $\text{loss} + \lambda \text{ReLU} (i_3 - O)$. This adds a penalty whenever your inequality is violated, so the model will tend to respect the constraint.
While this solution is inexact, It will be more challenging to solve this exactly because constrained optimization is not something NN libraries are designed for.
Some related solutions:
Loss function in machine learning - how to constrain?