Solved – Struggling to train a MLP using Keras (Python)

neural networkspython

I've been interested in NNs for a while, just started playing with them. I liked the look of Keras, so I got started with some toycode to do some regression.

I tried the simplest set up I could:

  • $500$ inputs drawn uniformly from an interval $0<x<2*\pi$
  • Targets generated by $3\times sin(x)+1+e$ for $e\sim \mathcal{N}(0, 0.5)$ (Sin curve with random error).
  • Neural network therefore has one input ($x$) and one output ($y$).
  • One fully connected hidden layer with three neurons, activation function of $tanh$, loss function of mean squared error, training
    algorithm of stochastic gradient descent.

See self contained code gist.

The only problem is that it seems no matter what I try, I can't fit the values of $x$ for $pi<x<2*\pi$. I've tried different loss functions, Nesterov momentum, more epochs, more layers, more neurons, different learning rates, decay, momentum, different optimizers. Some made the fit slightly better, many changes made it worse.

Am I making a syntactical mistake somewhere that's affecting my results or is there a better way to build a simple MLP for this kind of problem? This is the first time I'm using Theano and Keras, so I don't know whether a fundamental mistake is plaguing me or if I need a new approach. I can't find any examples for regression using the Keras library.

Best Answer

If you want to model a sinusoid, I think that a stateful LSTM (RNN) might be a more natural choice. You can find an excellent example of modelling a sinusoid with an exponential amplitude decay in the keras example.

However, I tried out your Keras code, and I think your problem is that you're not letting it train long enough. Look at your loss at epoch 250, its VERY high!!

Epoch 250/250
360/360 [==============================] - 0s - loss: 0.5291 - val_loss: 0.7775

If I changed the number of nodes in your hidden layer to 10 and let it run for 15000 epochs instead of 250, I found that the loss was considerably lower and the plot more what you expect.

Epoch 15000/15000
360/360 [==============================] - 0s - loss: 0.2434 - val_loss: 0.2638

enter image description here

The updated code looks like:

# Multilayer Perceptron
model = Sequential()    # Feedforward
model.add(Dense(10, input_dim=1))
model.add(Activation('tanh'))
model.add(Dense(1))
model.compile('sgd', 'mse')

hist = model.fit(xtr, ttr, validation_split=0.1, nb_epoch=15000)
Related Question