Solved – Keras TimeSeries – Regression with negative values

convolutionkeraslstmregressiontime series

I am trying to make regression tasks for time series, my data is like the below, i make window size of 10, and input feature as below, and target is the 5th column. as you see it has data of {70, 110, -100, 540,-130, 50}

My model as below:

model = Sequential((
    Conv1D(filters=filters, kernel_size=kernel_size, activation='relu',
    input_shape=(window_size, nb_series)),
    MaxPooling1D(),
    Conv1D(filters=filters, kernel_size=kernel_size, activation='relu'),
    MaxPooling1D(),
    Flatten(),
    Dense(nb_outputs, activation='linear'),
))
model.compile(loss='mse', optimizer='adam', metrics=['mae'])

My Input features as below:

0.00000000,0.42857143,0.57142857,0.00000000,70.00000000,1.00061741,1.00002238,22.40000000,24.85000000,30.75000000,8.10000000,1.00015876,1.00294701,0.99736059,-44.57995000,1.00166700,0.99966561,-0.00003286,0.00030157,1.00252034,49.18000000,40.96386000,19.74918000,-62.22000000
0.00000000,0.09090909,0.72727273,0.18181818,110.00000000,0.99963650,0.99928427,19.19000000,28.89000000,26.65000000,8.60000000,0.99939526,1.00217111,0.99660950,12.04301000,1.00082978,0.99883018,0.00008147,0.00026953,1.00153663,53.70000000,84.81013000,49.33018000,-42.22000000
0.00000000,0.20000000,0.80000000,0.00000000,-100.00000000,1.00034178,1.00016118,19.04000000,27.35000000,36.43000000,9.00000000,1.00028776,1.00300655,0.99756896,-40.34054000,1.00162433,0.99962294,-0.00000094,0.00019842,1.00235166,48.98000000,73.17073000,64.22563000,-62.22000000
0.00000000,0.07407407,0.92592593,0.00000000,540.00000000,0.99554634,0.99608051,20.92000000,32.90000000,20.02000000,12.60000000,0.99583374,0.99957548,0.99209201,166.35514000,0.99723072,0.99523842,0.00069929,0.00025201,0.99342482,67.12000000,89.24051000,83.36000000,-4.23000000
1.00000000,0.30769231,0.53846154,0.15384615,-130.00000000,0.99639984,0.99731696,21.73000000,29.41000000,17.35000000,12.20000000,0.99672034,1.00037538,0.99306530,119.32773000,0.99799071,0.99599723,0.00083646,0.00027643,0.99429023,64.25000000,86.70213000,86.32629000,-13.89000000
1.00000000,0.20000000,0.20000000,0.60000000,50.00000000,0.99590955,0.99698694,24.48000000,37.15000000,15.04000000,12.90000000,0.99618042,1.00005922,0.99230162,123.46570000,0.99737959,0.99538689,0.00105610,0.00034937,0.99368338,66.72000000,87.79070000,86.43382000,-1.39000000

I get the below loss and no matter how many epochs, switching between activation functions, optimizer.
I understand that this is because of the mean of the output over my dataset is between 122-124 this is why i always get this value.

297055/297071 [============================>.] - ETA: 0s - loss: 22789.0087 - mean_absolute_error: 123.0670
297071/297071 [==============================] - 144s 486us/step - loss: 22788.9740 - mean_absolute_error: 123.0673 - val_loss: 10519.1722 - val_mean_absolute_error: 79.3461

And by testing the prediction using the below code:

pred = model.predict(X_test)
print('\n\nactual', 'predicted', sep='\t')
for actual, predicted in zip(y_test, pred.squeeze()):
    print(actual.squeeze(), predicted, sep='\t')

I get the below output:
for linear activation at the output layer

20.0    -0.059563223
-22.0   -0.059563223
-55.0   -0.059563223

for relu activation at the output layer:

235.0 0.0
-170.0 0.0
154.0 0.0

And Sigmoid:

-54.0   1.4216835e-36
-39.0   0.0
66.0    2.0888916e-37

Is there a way to predict continuous integers like above ?

Is it the activation function ?

Is it an issue of feature selection ?

Is it an architectural issue, maybe LSTM is better ?

Also any recommendation regarding the kernel size, filters, loss, activation and optimizer is so much appreciate.

Update:
I have tried to use LSTM using the below model:

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam', metrics=['mae'])
# fit network
model.fit(X_train, y_train, epochs=2, batch_size=10, 
validation_data=(X_test, y_test), shuffle=False)

And i got the below Loss:

297071/297071 [==============================] - 196s 661us/step - loss: 122.8202 - mean_absolute_error: 122.8202 - val_loss: 78.2440 - val_mean_absolute_error: 78.2440
Epoch 2/2
297071/297071 [==============================] - 196s 661us/step - loss: 122.3811 - mean_absolute_error: 122.3811 - val_loss: 78.4328 - val_mean_absolute_error: 78.4328

And the below predicted values:

-55.0   -45.222805
-105.0  -21.363165
29.0    -18.858946
-125.0  -34.27912
-134.0  20.847342
-108.0  30.286516
113.0   31.09069
-63.0   8.848535

Is it the architecture or the data ?

Update 2:
After using MinMaxScaler and testing the predicted versus actual i got the below values, basically normalized or not i get very bad output

Expected=-51.0, Predicted=1.0
Expected=76.0, Predicted=10.4
Expected=101.0, Predicted=-19.6
Expected=-49.0, Predicted=-33.1
Expected=-56.0, Predicted=-14.4
Expected=-58.0, Predicted=-5.0
Expected=52.0, Predicted=6.1
Expected=66.0, Predicted=-15.6
Expected=-58.0, Predicted=-29.0
Expected=43.0, Predicted=-9.8

Best Answer

From what you have described and posted I see a few things that you could improve. Please see them below.

Are you normalizing your input/output variable? From what you posted it doesn't look like you are, correct me if I am wrong. If you are not, you definitely need to. DNN have issues when dealing with non-normalized data because of variable weights. Say you have two input features, one 100 the other 10. Apply similar weights and the feature with the larger nominal value will end up having a larger weight in the model. You can try [0,1] or [-1,1] range normalization. Although you can always Z-Score normalize too. I prefer the range normalization because they have similar max / min values. Find the calculations for range normalization below.

$\ [0,1]\ Norm = \frac{x - min(x)}{max(x)-min(x)} $

$\ [-1,1]\ Norm = 2*\frac{x - min(x)}{max(x)-min(x)}-1 $

Futhermore, try adding a dropout layer or a batch normalization layer. These reduce overfitting of the network and are generally a good idea to include in any network.

I noticed you are only using 2 epochs to train the model. This is a really low number. You aren't giving the model enough tries at refining the problem. Try a higher amount of epochs too. Adding more epochs is also useful for identify if the model is underfitting or overfitting as you can plot the loss metric over the epochs.

This is a very surface level analysis. If you need more detail let me know.

Update

I had the very same questions you had. As a background, most of my predictive experience is in finance/trading where direction and magnitude is important. For most of my models I use [-1,1]. If find it works the best, purely my experience. To solve the direction/magnitude issue, you must consider the loss function you are using. If model predicts 1 and true data is -2, error will be 3 for that point based off a simple residual. Based off this, you should be able to convince yourself that direction is but magnitude. So using [0,1] norm shouldn't matter because the magnitude from 1 to -2 will remain the same when normalized, given the units will change.

For normalization, you just apply the inverse of the function to "de-normalize" the data. In theory, yes you could have an issue if you are predicting beyond the bounds of max(x) and min(x), but if the value predicted is within that range there will be no problem. You could just normalize the input vectors and leave the output vector untouched. The value normalization adds to the model is worth this risk. The important feature of normalizing, as said before, is all the input vectors become equivalent in units. The fact that some of your input vectors are normalized is good, but they all need to have the same normalization for you to gain the benefits.

Update 2

I'm not to familiar with CNNs, more RNNs (GRU / LSTM). So I'm not sure if there is an error in your architecture. From my research and experience, I have found RNNs very useful when it comes to time-series forecasting because of the recurrent aspects of them. The ability for them to handle temporal data is a huge benefit in my book. CNNs, from what I know, do not have this feature. Here is a link chatting about them both:

https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/

As far as a loss function, test both. Check performance. Here are a few links that compares and contrasts the loss functions available:

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

https://people.duke.edu/~rnau/compare.htm

https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0

One last thing. Not all problems are solvable by machine learning, especially in time-series. The data could just be too noisy for the model to work. Try forecasting the SPY time-series, you will most likey come up with a subpar model. It is just too noisy. So when in doubt, start with a simple model and when it is shown not to be overfitting increase the complexity.