Solved – lag in prediction outputs in one-step ahead neural network autoregressive model

autoregressiveforecastingneural networkspredictionregression

I am working on an ARX forecasting problem mostly using feed-forward neural networks in MATLAB. The functional model is of the form
$y(t) = f(y(t-1),…,y(t-n),u(t))$. My data is at half hourly resolution.

The problem is that forecasted outputs appear to be lagged by one time step when compared to the true outputs.

I have successfully applied my forecasting model to a similar data set. A new set I am testing has a lot more variation sample-to-sample. That is, the trained model gives a prediction $\hat{y}(t)$ that is very close to $y(t-1)$. When I consider predictions farther into the future, such as $y(t) = f(y(t-j),…,y(t-n-j),u(t))$, where $j>4$ and the short-term autoregressive contribution is weaker (which is also observed in the trained model), I don't observe this offset.

To sanity check, I have trained a simple AR(n) model using OLS, and I observe the same sorts of lags on my data. When I use the same procedure on my earlier, and smoother, data set, no such lag exists. I have also generated a model $y(t) = \sin(t) + w(t)$, where $w(t) \sim N(0,\sigma^2)$, sampled at .1 s, and attempted to train a model using both neural networks and OLS. The inputs are $y(t-1),\ldots,y(t-4)$, and the targets are $y(t)$. For both, when $\sigma$ is sufficiently small (say $<0.5$, the predicted values for newly generated outputs doesn't exhibit a lag, but as the variance increases (say $>.1$), I notice the one-step lag.

Could you please let me know what's going on? I'm hoping this is just a beginner's mistake… but I'm quite puzzled as I have tried a number of other modeling approaches and observed similar behavior. Thanks in advance!

Best Answer

Most likely, there is nothing wrong with your code or your data. What is happening is that your neural network is starting to act like a so-called "naive predictor" (look it up), meaning that since the NN cannot resolve the problem (cannot resolve any pattern in the input data), the best answer is the previous value. There are a number of techniques to try to solve this (using deltas, etc.) but I have yet to see one work (which does not mean it won't work; just that I haven't seen pre-processing the input data solve the problem). Just my 2 cents...

Related Solutions

R Neural Networks – How to Train and Validate a Neural Network Model in R?

Max Kuhn's caret Manual - Model Building is a great starting point.

I would think of the validation stage as occurring within the caret train() call, since it is choosing your hyperparameters of decay and size via bootstrapping or some other approach that you can specify via the trControl parameter. I call the data set I use for characterizing the error of the final chosen model my test set. Since caret handles selection of hyperparameters for you, you just need a training set and a test set.

You can use the createDataPartition() function in caret to split your data set into training and test sets. I tested this using the Prestige data set from the car package, which has information about income as related to level of education and occupational prestige:

library(car)
library(caret)
trainIndex <- createDataPartition(Prestige$income, p=.7, list=F)
prestige.train <- Prestige[trainIndex, ]
prestige.test <- Prestige[-trainIndex, ]

The createDataPartition() function seems a little misnamed because it doesn't create the partition for you, but rather provides a vector of indexes that you then can use to construct training and test sets. It's pretty easy to do this yourself in R using sample() but one thing createDataPartition() apparently does do is sample from within factor levels. Moreover, if your outcome is categorical, the distribution is maintained across the data partitions. It's not relevant in this case, however, since your outcome is continuous.

Now you can train your model on the training set:

my.grid <- expand.grid(.decay = c(0.5, 0.1), .size = c(5, 6, 7))
prestige.fit <- train(income ~ prestige + education, data = prestige.train,
    method = "nnet", maxit = 1000, tuneGrid = my.grid, trace = F, linout = 1)

Aside: I had to add the linout parameter to get nnet to work with a regression (vs. classification) problem. Otherwise I got all 1s as predicted values from the model.

You can then call predict on the fit object using the test data set and calculate RMSE from the results:

prestige.predict <- predict(prestige.fit, newdata = prestige.test)
prestige.rmse <- sqrt(mean((prestige.predict - prestige.test$income)^2))

Time Series Forecasting – Difference Between One-Step Ahead Forecast and Fitted Model

forecast always produces forecasts beyond the end of the data.

So forecast(fit) produces forecasts for observations 401, 402, ... and forecast(refit) produces forecasts for observations 501, 502, ...

fitted produces one-step in-sample (i.e., training data) "forecasts". That is, it gives a forecast of observation t using observations up to time t-1 for each t in the data.

So fitted(fit) gives one-step forecasts of observations 1, 2, ... It is possible to produce a "forecast" for observation 1 as a forecast is simply the expected value of that observation given the model and any preceding history.

fitted(refit) gives one-step forecasts of observations 401, 402, .... So it uses the model estimated on observations 1...400, but it uses the data from time 401...500.

Note that forecast(fit)$mean[1] will not be the same as fitted(refit)[1] due to differences in what they are conditioning on. forecast(fit)$mean[1] conditions on the training data (observations 1...400) while fitted(refit) conditions only on the test data and it does not "know" what the training data were. So fitted(refit)[1] is the estimate of observation 401 given the model but no history, while forecast(fit)$mean[1] is the estimation of observation 401 given the model and the data up to time 400.

Update

Note that the model is actually \begin{align} y_t &= \mu + n_t \\ n_t &= \phi n_{t-1} + e_t \end{align} where $\mu$ is the estimated "intercept" and $\phi$ is the ar coefficient. So if you write it in the more usual way, $$ y_t = (1-\phi)\mu + \phi y_{t-1} + e_t $$ Thus forecasts are given by

> phi <- coef(fit)['ar1']
> mu <- coef(fit)['intercept']
> (by.hand <- phi*test.5 + (1-phi)*mu)
[1]  1.318043  0.010579  0.628453 -0.515169 -2.010278
> 
> (auto <- c(forecast(fit)$mean[1], fitted(refit)[2:5]))
[1]  1.318043  0.010579  0.628453 -0.515169 -2.010278

Best Answer

Related Solutions

R Neural Networks – How to Train and Validate a Neural Network Model in R?

Time Series Forecasting – Difference Between One-Step Ahead Forecast and Fitted Model

Related Question