Solved – time series – Poor prediction using ARIMA model

I am trying to fit and forecast log returns of a price data using ARIMA model in R. For reproducibility, data is provided here.

Steps Followed, Code and Results obtained

Check for outliers (Package: forecast) – No outliers detected.
```
outliers <- tsoutliers(log.rtn)
```
Stationarity Check using ADF test (Package: fUnitRoots) – Series found to be stationary
```
stationary <- adfTest(log.rtn, lags = m1$order, type = c("c"))
```
Determination of p,d,q using ACF and PACF (Package: astsa) – Based on my understanding, p = 2, d = 0, q = 2
```
acf2(log.rtn, lags = 20)
```

Fitting ARIMA (Package: forecast)

fit <- auto.arima(log.rtn, stepwise=FALSE, trace=TRUE, approximation=FALSE)

Model obtained : ARIMA(2,0,1)

Series: log.rtn 

  ARIMA(2,0,1) with zero mean     

Coefficients:
          ar1     ar2     ma1
      -0.5705  0.1557  0.6025
s.e.   0.1549  0.0532  0.1519

sigma^2 estimated as 0.001086:  log likelihood=775.57
AIC=-1543.14   AICc=-1543.04   BIC=-1527.29

Prediction (Package:forecast)

fcast <- forecast(fit, n.ahead=5)
plot(fcast)

    Point Forecast       Lo 80      Hi 80       Lo 95      Hi 95
390   1.416920e-03 -0.04080849 0.04364233 -0.06316127 0.06599511
391   8.228924e-04 -0.04142414 0.04306993 -0.06378837 0.06543416
392  -2.488236e-04 -0.04289257 0.04239493 -0.06546681 0.06496917
393   2.700663e-04 -0.04248622 0.04302635 -0.06512003 0.06566016
394  -1.928045e-04 -0.04303250 0.04264690 -0.06571047 0.06532486
395   1.520366e-04 -0.04273465 0.04303872 -0.06543749 0.06574156
396  -1.167506e-04 -0.04303183 0.04279833 -0.06574971 0.06551621
397   9.027370e-05 -0.04284167 0.04302221 -0.06556846 0.06574901
398  -6.967566e-05 -0.04301167 0.04287232 -0.06574379 0.06560444
399   5.380284e-05 -0.04289419 0.04300179 -0.06562948 0.06573708

I am quite confused why the model is predicting so badly.

Best Answer

For log returns the recommended model to use is GARCH and its variations. Log returns are characterised by volatility clusters: periods of high volatility are followed by high volatility and periods of low volatility are followed by low volatility.

GARCH is designed to handle volatility in a much better way than ARIMA. Further I would not treat the data for outliers as a perceived outlier could carry signal on the start (or end) of a volatility cluster.

Check this post, the R package fGarch and the function garch from package tseries.

Best Answer

Related Solutions

Solved – Time series prediction using ARIMA vs LSTM

Solved – Poor fit of an ARIMA model

Related Question