Solved – Predictions for random walk in ARIMA model

arimasignal processingtime series

Below I'm showing just a small subset of a larger set of measurements of a process that I'm using to in turn predict something else. The part of the process that is my signal of interest is the random-walk. I've posted the data in csv format for those especially interested, but it is not necessary to look at this to answer my question.

I've fitted an ARIMA(1,1,2) model to my signal (after log tranforming it). It was the best by AIC/SBC model selection, and the prediction is overlaid on the original (after log transform) below:

Predicted Overlaid on Actual

And residuals look like white noise to me (no test yet performed for that though):
enter image description here

  • In general, how do I get the prediction at each time step for the random walk portion of the ARIMA model?
  • The outlier with a value around -5 is a bad data point and I'd like to exclude it. If it's not broadening the question too much I'd like to know how to exclude data points that fall outside of pre-determined limits during an online prediction.
  • I did notice my residuals show a change in variance, so if that violates some kind of ARIMA model assumptions or something let me know.

Best Answer

Thanks to the help on this forum i was also able to ask this consolidated form of the original question to one of the profs at my university who is teaching a time-series class this semester....which i should have taken :-)

I thought his answer was pretty good, and also has some echos of other comments posted for this question.

I'm not so sure I should accept this answer officially...but at least want to share it here.

Prof Answer:

You have some strange patterns in your data… it looks like there is some type of “structural break” (the form or pattern of the time process changes around time point 3000). It might make sense to break the time series into two pieces and analyze each separately, if everything else is not working so well.

If covariances are indeed changing over time, then you would be violating the Box-Jenkins model.

I would try some intermediate steps. First just difference your time series: compute Y_t= X_t - X_{t-1} (after log transformation) and see if it looks like there’s some similar pattern throughout. If so, then you could try fitting a ARMA model to the difference. Are you using a software package that computes ARMA models? Can you fit such models on the differences Y_t?

Sometimes software packages will compute the k-step forecasts for the future… if you had those, then you could undo the differencing to get your predictions on-line. So for example, you get predictions in the future for Y_{n+1},…,Y_{n+k} and you know Y_{n+i} = X_{n+i}- X_{n+i-1} so you can find X_{n+i} = Y_{n+i}+X_{n+i-1} for i=1,…,k by the predictions for the differences Y_{n+i} and the observation you have for X_n. Otherwise, you might have to resort to Kalman filtering, which can be ok too.

If you have any outliers you could just delete them in fitting the model. Essentially, you would have no information at that time point, but that’s better than misleading information.

I guess that I just wonder (based on your initial graph) about how the fit of the time series model might improve if you considered breaking the time series into two pieces. Maybe you would do a better job of capturing extremes in the 2nd portion of the series.