Solved – Unable to get suitable forecast for ARIMA model in R due to outliers– attached code for easy replication

arimartime series

Using the attached data that has been recently updated I am not able to obtain a statistically significant forecast. The data is extremely seasonal. The data is stored here for easy replication:

http://ge.tt/1uihVfA2/v/0?c

# 1. Make a R timeseries out of the rawdata: specify frequency & startdate
gIIP <- ts(Data, frequency=12, start=c(2003,11))
print(gIIP)
plot.ts(gIIP, type="l", col="blue", ylab="MTD Ships", lwd=2,
        main="Full data")
grid()

Using the auto.arima function I don't need to factor a Box-Cox because the auto.arima factors that into selecting the best model.

Upon "selecting the best model" I The best model suggested was Arima(order = c(0, 0, 1),
seasonal = list(order = c(1, 0, 1), period = 12) with non-zero mean

# 5. Perform estimation
library(forecast)
library(zoo)
library(stats)
auto.arima(gIIP, d=NA, D=NA, max.p=12, max.q=12,
           max.P=2, max.Q=2, max.order=12, max.d=2, max.D=2,
           start.p=2, start.q=2, start.P=1, start.Q=1,
           stationary=FALSE, seasonal=TRUE,
           ic=c("aicc","aic", "bic"), stepwise=FALSE, trace=TRUE,
           approximation=FALSE | frequency(gIIP)>12), xreg=NULL,
           test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"),
           allowdrift=TRUE, lambda=TRUE, parallel=FALSE, num.cores=4

)

then proceed to conduct accuracy diagnostics but unable to obtain any output.

#Check standard error etc of "fitted" ARIMA
pos.arima <- function(gIIP, order = c(0, 0, 1),
      seasonal = list(order = c(1, 0, 1), period = 12),
      xreg = NULL, include.drift=TRUE, 
      transform.pars = TRUE,
      fixed = NULL, init = NULL,
      method = c("CSS-ML", "ML", "CSS"), 
      optim.method = "BFGS",
      optim.control = list(), kappa = 1e6)

acf(pos.arima) 
pacf(pos.arima)

The following step to conduct an ex ante (out of sample forecast) but also unable to obtain a statistically significant forecast—forecast with lowest standard error rate. I tested this by removing the last 5 observations to test the model.

# 7. Forecast Out-Of-Sample ---this used to work
fit <- Arima(gIIP, order = c(0, 0, 1), seasonal = list(order = c(1, 0, 1), period = 12),
             xreg = TRUE, include.mean = TRUE, transform.pars = TRUE, 
             fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), 
             optim.method = "BFGS", optim.control = list(), kappa = 1e6)
plot(forecast(fit,h=9))
print(forecast(fit,h=9))

Used to obtain output here. Can you please help me diagnose why there ARIMA model is not working like it once did for me? Thank you for your time.

Best Answer

The basic problem in your data is that it has outliers, not treating them are at least understanding what/where the outliers are present might lead to poor predictions.

There is a package in R called tsoutliers which implements Chen and Liu that can help you diagnose outliers in the data. Commercial packages such as Autobox which uses Tsay's outlier detection also has excellent outlier detection capabilities. I'll expand my answer in the next few days bear with me.

I used tso function in tsoutliers package to detect outliers

datats <- ts(data,start=c(2003,11),frequency=12)
plot.ts(datats)

c  <- tso(datats, types = c("AO", "LS","SLS"))
plot(c)

Below are the outputs:

Outliers:
  type ind    time  coefhat tstat
1  SLS  18 2005:04 16923128 6.765
2   AO 101 2012:03 36590158 4.763
3  SLS 112 2013:02 21989974 4.096
4  SLS 113 2013:03 25225304 4.699
5   AO 115 2013:05 24259786 3.158

In looking at your data, there is an additive outlier at observation 2012:03 and seasonal level shift around 2013:02. You can practically ignore seasonal level shift at 2005:04. tsoutliers provides nice graphical output that shows some instability in the last few years 2012/2013/2014, that is seasonal variation has changed. If you do not account for it,then you are bound to produce poor forecasts.

enter image description here

Related Question