Solved – Neural Networks – Performance VS Amount of Data

This is one of the slides from Andrew Ng course on deep learning. Actually I took it from Jason Brownlee website that seems to second the idea presented on the picture.

However, my limited experience shows that after some point the line stats to head down. I use Keras with EarlyStopping to prevent overfitting. The additional data that I introduce is basically temperature from extra past hours. Even though the temperature is highly correlated with predicted parameter (Pearson's R~0.9) I still get decrease in performance (increased MSE).

What could cause that?

What's more: I use two layer NN and increase its number of neurons (input and hidden) for extra single parameter added.

My code:

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

kf = KFold(n_splits=10  , random_state=seed, shuffle=True)
kf.get_n_splits(x_cv)
print(kf) 

cvscores = []
for train, test in kf.split(x_cv):
    # create model
    model = Sequential()
    model.add(Dense(55, activation="relu", kernel_initializer="normal", input_dim=55)) #when activation=tanh then rescale to -1 1
    model.add(Dense(55, activation="relu", kernel_initializer="normal"))
    #model.add(Dense(30, activation="relu", kernel_initializer="normal"))
    #model.add(Dense(31, input_dim=31, init= normal , activation= relu ))
    model.add(Dense(1, kernel_initializer="normal"))
    # Compile model
    model.compile(loss= 'mean_squared_error' , optimizer= 'adam' )
    #EarlyStopping:
    es = EarlyStopping(monitor='loss', min_delta=0.0, patience=3, verbose=0, mode='min')

# Fit the model
model.fit(x_cv[train], y_cv[train],callbacks=[es],batch_size=100, epochs=1000,verbose=0)

scores = model.evaluate(x_cv[test], y_cv[test], verbose=0)
print 'mean_squared_error',scores
cvscores.append(scores)

Best Answer

The notion of "more data -> better performance" is normally used in context of number of samples and not the size of each sample. I.e. Deep learning can extract more information from higher number of observations than other methods. In your example you are talking more about giving additional information per sample rather than more samples.

Things to check:

Scale of the temperature - improperly scaled inputs can completely destroy the stability of training
Outliers - if model heavily relies on the temperature to predict the outcome it is possible that outliers in this relationship can create wildly wrong predictions and since MSE is sensitive to outliers you get worse performance.

Best Answer

Related Solutions

Solved – Encoding Date/Time (cyclic data) for Neural Networks

Solved – Exact amount of data to avoid overfitting with convolutional neural networks

Related Question