What degree of difference does validation and training loss need to have to be called overfit

I've trained an LSTM network to predict time series data however i'm quite new to LSTMs and am unsure if the model has overfit.

I know that an increasing validation loss relative to a decreasing training loss is the main way you determine overfitting and underfitting along with plotting the real values against the predicted values, so that is what I did. but im getting what seem like contrasting results. Or am I wrong in thinking so?

here's my model which ive made in python, TensorFlow/Keras

lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(32, return_sequences=True),
    tf.keras.layers.Dense(units=1)
])

and here's the metrics of my last 5 epochs ( I ran the data on a total of 15 epochs i've tried increasing it to 20 and decreasing it to 10 with similar results. my loss function is MSE.

Epoch 11/15
 - 17s 146ms/step - loss: 4.4813e-04 - mean_absolute_error: 0.0138 - val_loss: 0.0016 - val_mean_absolute_error: 0.0308
Epoch 12/15
 - 17s 149ms/step - loss: 4.1059e-04 - mean_absolute_error: 0.0133 - val_loss: 0.0015 - val_mean_absolute_error: 0.0295
Epoch 13/15
 - 17s 146ms/step - loss: 3.8052e-04 - mean_absolute_error: 0.0128 - val_loss: 0.0014 - val_mean_absolute_error: 0.0288
Epoch 14/15
 - 17s 152ms/step - loss: 3.5690e-04 - mean_absolute_error: 0.0125 - val_loss: 0.0013 - val_mean_absolute_error: 0.0279
Epoch 15/15
 - 18s 156ms/step - loss: 3.3720e-04 - mean_absolute_error: 0.0122 - val_loss: 0.0013 - val_mean_absolute_error: 0.0276

I personally think that this isn't very overfit, because while there is a relatively big difference between val_loss and training loss (loss), they both are still fairly low.

But then when I looked at the plot of real values against predictions, it felt a little too accurate, because on short time intervals such as 0-20 days, there's a decent amount of errors that the model is making, however when I take large time intervals such as one year, the predictions seem to be a little too accurate, the plots ive attached below are of the model's predictions on 10 and 365 days respectively.

red : predicted value

blue: real value

So what do I conclude from this? The val loss and training loss seems to be okay but the plot seems to be too good. I assume that im wrong in thinking that the val_loss and training loss are stable, if so what kind of difference should both of them have when I should understand that its overfit, if not then is this just a good model?

p.s I sincerely apologise if this forum is the wrong place for this question or if ive done something wrong, its my first time posting to cross validated

Best Answer

Since penalized models (methods that use shrinkage AKA regularization) incorporate intentional underfitting, in general it is not often fruitful to compare the amount of overfitting in the training sample to the amount of overfitting in the test sample. The overfitting in the test sample can stand alone in describing estimated future model performance. If the estimation process did not use shrinkage, then the change in performance from training to testing can quantify the amount of overfitting; it's just not the whole story.

Best Answer

Related Solutions

Solved – Sequence lengths in LSTM / BiLSTMs and overfitting

Solved – Understanding how to batch and feed data into a stateful LSTM

Related Question