I was playing with a simple Neural Network with only one hidden layer, by Tensorflow, and then I tried different activations for the hidden layer:
- Relu
- Sigmoid
- Softmax (well, usually softmax is used in the last layer..)
Relu gives the best train accuracy & validation accuracy. I am not sure how to explain this.
We know that Relu has good qualities, such as sparsity, such as no-gradient-vanishing, etc, but
Q: is Relu neuron in general better than sigmoid/softmax neurons ?
Should we almost always use Relu neurons in NN (or even CNN) ?
I thought a more complex neuron would introduce better result, at least train accuracy if we worry about overfitting.
Thanks
PS: The code basically is from "Udacity-Machine learning -assignment2", which is recognition of notMNIST using a simple 1-hidden-layer-NN.
batch_size = 128
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# hidden layer
hidden_nodes = 1024
hidden_weights = tf.Variable( tf.truncated_normal([image_size * image_size, hidden_nodes]) )
hidden_biases = tf.Variable( tf.zeros([hidden_nodes]))
hidden_layer = **tf.nn.relu**( tf.matmul( tf_train_dataset, hidden_weights) + hidden_biases)
# Variables.
weights = tf.Variable( tf.truncated_normal([hidden_nodes, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))
# Training computation.
logits = tf.matmul(hidden_layer, weights) + biases
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels) )
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_relu = **tf.nn.relu**( tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, weights) + biases)
test_relu = **tf.nn.relu**( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
test_prediction = tf.nn.softmax(tf.matmul(test_relu, weights) + biases)
Best Answer
In addition to @Bhagyesh_Vikani:
There are also generalisations of rectified linear units. Rectified linear units and its generalisations are based on the principle that linear models are easier to optimize.
Both sigmoid/softmax are discouraged (chapter 6: Ian Goodfellow) for vanilla feedforward implementation. They are more useful for recurrent networks, probabilistic models, and some autoencoders have additional requirements that rule out the use of piecewise linear activation functions.
If you have a simple NN (that's the question), Relu is your first preference.