Using the attached data that has been recently updated I am not able to obtain a statistically significant forecast. The data is extremely seasonal. The data is stored here for easy replication:
# 1. Make a R timeseries out of the rawdata: specify frequency & startdate
gIIP <- ts(Data, frequency=12, start=c(2003,11))
print(gIIP)
plot.ts(gIIP, type="l", col="blue", ylab="MTD Ships", lwd=2,
main="Full data")
grid()
Using the auto.arima function I don't need to factor a Box-Cox because the auto.arima factors that into selecting the best model.
Upon "selecting the best model" I The best model suggested was Arima(order = c(0, 0, 1),
seasonal = list(order = c(1, 0, 1), period = 12) with non-zero mean
# 5. Perform estimation
library(forecast)
library(zoo)
library(stats)
auto.arima(gIIP, d=NA, D=NA, max.p=12, max.q=12,
max.P=2, max.Q=2, max.order=12, max.d=2, max.D=2,
start.p=2, start.q=2, start.P=1, start.Q=1,
stationary=FALSE, seasonal=TRUE,
ic=c("aicc","aic", "bic"), stepwise=FALSE, trace=TRUE,
approximation=FALSE | frequency(gIIP)>12), xreg=NULL,
test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"),
allowdrift=TRUE, lambda=TRUE, parallel=FALSE, num.cores=4
)
then proceed to conduct accuracy diagnostics but unable to obtain any output.
#Check standard error etc of "fitted" ARIMA
pos.arima <- function(gIIP, order = c(0, 0, 1),
seasonal = list(order = c(1, 0, 1), period = 12),
xreg = NULL, include.drift=TRUE,
transform.pars = TRUE,
fixed = NULL, init = NULL,
method = c("CSS-ML", "ML", "CSS"),
optim.method = "BFGS",
optim.control = list(), kappa = 1e6)
acf(pos.arima)
pacf(pos.arima)
The following step to conduct an ex ante (out of sample forecast) but also unable to obtain a statistically significant forecast—forecast with lowest standard error rate. I tested this by removing the last 5 observations to test the model.
# 7. Forecast Out-Of-Sample ---this used to work
fit <- Arima(gIIP, order = c(0, 0, 1), seasonal = list(order = c(1, 0, 1), period = 12),
xreg = TRUE, include.mean = TRUE, transform.pars = TRUE,
fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"),
optim.method = "BFGS", optim.control = list(), kappa = 1e6)
plot(forecast(fit,h=9))
print(forecast(fit,h=9))
Used to obtain output here. Can you please help me diagnose why there ARIMA model is not working like it once did for me? Thank you for your time.
Best Answer
The basic problem in your data is that it has outliers, not treating them are at least understanding what/where the outliers are present might lead to poor predictions.
There is a package in R called tsoutliers which implements Chen and Liu that can help you diagnose outliers in the data. Commercial packages such as Autobox which uses Tsay's outlier detection also has excellent outlier detection capabilities. I'll expand my answer in the next few days bear with me.
I used
tso
function intsoutliers
package to detect outliersBelow are the outputs:
In looking at your data, there is an additive outlier at observation 2012:03 and seasonal level shift around 2013:02. You can practically ignore seasonal level shift at 2005:04. tsoutliers provides nice graphical output that shows some instability in the last few years 2012/2013/2014, that is seasonal variation has changed. If you do not account for it,then you are bound to produce poor forecasts.