Solved – time series – Poor prediction using ARIMA model

arimartime series

I am trying to fit and forecast log returns of a price data using ARIMA model in R. For reproducibility, data is provided here.

Steps Followed, Code and Results obtained

  1. Check for outliers (Package: forecast) – No outliers detected.

    outliers <- tsoutliers(log.rtn)
    
  2. Stationarity Check using ADF test (Package: fUnitRoots) – Series found to be stationary

    stationary <- adfTest(log.rtn, lags = m1$order, type = c("c"))
    
  3. Determination of p,d,q using ACF and PACF (Package: astsa) – Based on my understanding, p = 2, d = 0, q = 2

    acf2(log.rtn, lags = 20)
    
  4. Fitting ARIMA (Package: forecast)

    fit <- auto.arima(log.rtn, stepwise=FALSE, trace=TRUE, approximation=FALSE)
    

    Model obtained : ARIMA(2,0,1)

    Series: log.rtn 
    
      ARIMA(2,0,1) with zero mean     
    
    Coefficients:
              ar1     ar2     ma1
          -0.5705  0.1557  0.6025
    s.e.   0.1549  0.0532  0.1519
    
    sigma^2 estimated as 0.001086:  log likelihood=775.57
    AIC=-1543.14   AICc=-1543.04   BIC=-1527.29
    
  5. Prediction (Package:forecast)

    fcast <- forecast(fit, n.ahead=5)
    plot(fcast)
    
        Point Forecast       Lo 80      Hi 80       Lo 95      Hi 95
    390   1.416920e-03 -0.04080849 0.04364233 -0.06316127 0.06599511
    391   8.228924e-04 -0.04142414 0.04306993 -0.06378837 0.06543416
    392  -2.488236e-04 -0.04289257 0.04239493 -0.06546681 0.06496917
    393   2.700663e-04 -0.04248622 0.04302635 -0.06512003 0.06566016
    394  -1.928045e-04 -0.04303250 0.04264690 -0.06571047 0.06532486
    395   1.520366e-04 -0.04273465 0.04303872 -0.06543749 0.06574156
    396  -1.167506e-04 -0.04303183 0.04279833 -0.06574971 0.06551621
    397   9.027370e-05 -0.04284167 0.04302221 -0.06556846 0.06574901
    398  -6.967566e-05 -0.04301167 0.04287232 -0.06574379 0.06560444
    399   5.380284e-05 -0.04289419 0.04300179 -0.06562948 0.06573708
    

I am quite confused why the model is predicting so badly.

Best Answer

For log returns the recommended model to use is GARCH and its variations. Log returns are characterised by volatility clusters: periods of high volatility are followed by high volatility and periods of low volatility are followed by low volatility.

GARCH is designed to handle volatility in a much better way than ARIMA. Further I would not treat the data for outliers as a perceived outlier could carry signal on the start (or end) of a volatility cluster.

Check this post, the R package fGarch and the function garch from package tseries.

Related Question