I have a time series data with two exogenous variables. I am using auto.arima from the forecast package to determine best fit. I wanted to know if I am implementing the auto.arima function correctly since I believe I am getting good forecast results. The figure below shows the sample time series of having 200 data points and 200 instances of exogenous variables (Var1, Var2)
I used the first 170 data points to fit ARIMA model and the next 30 data points for forecasting. I use auto.arima to find best fit ARIMA model.
model <- auto.arima(raw_data$timeseries[1:170], xreg=as.matrix(raw_data[1:170,2:3]));
pred <- forecast.Arima(model,h=30, xreg=as.matrix(raw_data[171:200,2:3]))
I get RMSE between actual and forecasted value as 0.00169 and the image is shown below (I have not shown 95% CI). The result seems acceptable however I have the following questions
-
Is this the correct implementation (assuming I have already done all checks for stationarity, seasonality, correlations etc)
-
You can see form the example that ARIMA (0,1,0) is best fit model. The output from forecast.Arima is however not the differenced value but non differenced value. Does forecast.Arima inverse the differences?
Best Answer
The approach you are using assumes that the two input (supporting series the X's) have a purely contemporaneous effect. The general approach to forming a ARMAX MODEL (Transfer Function)
is as follows.
examine residuals to identify any additional structure.
You might also peruse https://stats.stackexchange.com/search?q=user%3A3382+transfer+function for more hints of forming a useful model