I am trying to fit and forecast log returns of a price data using ARIMA model in R. For reproducibility, data is provided here.
Steps Followed, Code and Results obtained
-
Check for outliers (Package:
forecast
) – No outliers detected.outliers <- tsoutliers(log.rtn)
-
Stationarity Check using ADF test (Package: fUnitRoots) – Series found to be stationary
stationary <- adfTest(log.rtn, lags = m1$order, type = c("c"))
-
Determination of p,d,q using ACF and PACF (Package: astsa) – Based on my understanding, p = 2, d = 0, q = 2
acf2(log.rtn, lags = 20)
-
Fitting ARIMA (Package: forecast)
fit <- auto.arima(log.rtn, stepwise=FALSE, trace=TRUE, approximation=FALSE)
Model obtained : ARIMA(2,0,1)
Series: log.rtn ARIMA(2,0,1) with zero mean Coefficients: ar1 ar2 ma1 -0.5705 0.1557 0.6025 s.e. 0.1549 0.0532 0.1519 sigma^2 estimated as 0.001086: log likelihood=775.57 AIC=-1543.14 AICc=-1543.04 BIC=-1527.29
-
Prediction (Package:forecast)
fcast <- forecast(fit, n.ahead=5) plot(fcast) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 390 1.416920e-03 -0.04080849 0.04364233 -0.06316127 0.06599511 391 8.228924e-04 -0.04142414 0.04306993 -0.06378837 0.06543416 392 -2.488236e-04 -0.04289257 0.04239493 -0.06546681 0.06496917 393 2.700663e-04 -0.04248622 0.04302635 -0.06512003 0.06566016 394 -1.928045e-04 -0.04303250 0.04264690 -0.06571047 0.06532486 395 1.520366e-04 -0.04273465 0.04303872 -0.06543749 0.06574156 396 -1.167506e-04 -0.04303183 0.04279833 -0.06574971 0.06551621 397 9.027370e-05 -0.04284167 0.04302221 -0.06556846 0.06574901 398 -6.967566e-05 -0.04301167 0.04287232 -0.06574379 0.06560444 399 5.380284e-05 -0.04289419 0.04300179 -0.06562948 0.06573708
I am quite confused why the model is predicting so badly.
Best Answer
For log returns the recommended model to use is GARCH and its variations. Log returns are characterised by volatility clusters: periods of high volatility are followed by high volatility and periods of low volatility are followed by low volatility.
GARCH is designed to handle volatility in a much better way than ARIMA. Further I would not treat the data for outliers as a perceived outlier could carry signal on the start (or end) of a volatility cluster.
Check this post, the R package
fGarch
and the functiongarch
from packagetseries
.