This is one of the slides from Andrew Ng course on deep learning. Actually I took it from Jason Brownlee website that seems to second the idea presented on the picture.
However, my limited experience shows that after some point the line stats to head down. I use Keras with EarlyStopping to prevent overfitting. The additional data that I introduce is basically temperature from extra past hours. Even though the temperature is highly correlated with predicted parameter (Pearson's R~0.9) I still get decrease in performance (increased MSE).
What could cause that?
What's more: I use two layer NN and increase its number of neurons (input and hidden) for extra single parameter added.
My code:
# fix random seed for reproducibility seed = 7 np.random.seed(seed)kf = KFold(n_splits=10 , random_state=seed, shuffle=True) kf.get_n_splits(x_cv) print(kf)
cvscores = [] for train, test in kf.split(x_cv): # create model model = Sequential() model.add(Dense(55, activation="relu", kernel_initializer="normal", input_dim=55)) #when activation=tanh then rescale to -1 1 model.add(Dense(55, activation="relu", kernel_initializer="normal")) #model.add(Dense(30, activation="relu", kernel_initializer="normal")) #model.add(Dense(31, input_dim=31, init= normal , activation= relu )) model.add(Dense(1, kernel_initializer="normal")) # Compile model model.compile(loss= 'mean_squared_error' , optimizer= 'adam' ) #EarlyStopping: es = EarlyStopping(monitor='loss', min_delta=0.0, patience=3, verbose=0, mode='min')
# Fit the model model.fit(x_cv[train], y_cv[train],callbacks=[es],batch_size=100, epochs=1000,verbose=0) scores = model.evaluate(x_cv[test], y_cv[test], verbose=0) print 'mean_squared_error',scores cvscores.append(scores)
Best Answer
The notion of "more data -> better performance" is normally used in context of number of samples and not the size of each sample. I.e. Deep learning can extract more information from higher number of observations than other methods. In your example you are talking more about giving additional information per sample rather than more samples.
Things to check: