I am currently doing a short term forecast using ARIMA model.I have been following Box and Jenkins method and to choose the best ARIMA parameters to do do my forecast I tested various(p,q) combinations and took the one with the lowest aic(akaike information criteria). I assigned a d value as the number of differenciation required. My problem is that my forecast is converging into a straight line after 2 hours. Did I choose a wrong method? Is it because I have to try higer AR(p) and MA(q) parameters. I put a threshold for AR(p) at 4 and for MA(q) at 2. I saw in some articles, they were going upt to 24 to do there forecast. Is there a limit for choosing parameters? Can you give me some references please?
Solved – ARMA parameters
arimaforecastingpython
Related Solutions
The original data plot . The time series is rich with ARIMA structure and Gaussian Violations which fortunately can be rectified. The underlying model is a (1,0,0)(1,1,0) with a large number of Pulse/One time anomnalies and a significant change point in the error variance (increase) at period 335. THe basic methodology is outlined in http://www.unc.edu/~jbhill/tsay.pdf and implememted in AUTOBOX , a piece of software that I have helped develop. Detecting change points in error variance leads doirectly to GLM with empirically identified weights. Note that the error variance is not related to the level of the series thus no power transformation is needed. The final equation is presented here and an error process ACF of suggesting an adequate model. The forecast for the next 11 years is and the Actual-Fit-Forecast graph is . THe problem you are having trying to use minitab ( and other time series software ) is that the time series is more complicated than what their solution allows for. It would be interesting to compare the minitab model with the AUTOBOX model ( shown partially )
There are a couple of issues here. Firstly, don't presume that the simulated ARIMA is truly of the order you specify; you are taking a sample from the specified model and due to randomness, the best fitting model for the particular sample drawn may not be the one from which the simulations were drawn.
I mention this because of the second and more important issue: the auto.arima()
function can estimate models via a more efficient fitting algorithm, using conditional sums of squares, to avoid excessive computational time for long series or for complex seasonal models. When this estimation process is in use, auto.arima()
approximates the information criteria for a model (because the log likelihood of the model has not been computed). A simple heuristic is used to determine whether the conditional sums of squares estimation is active, if the user does not indicate which approach should be used.
The behaviour is controlled via argument approximation
and the simple heuristic is (length(x)>100 | frequency(x)>12)
, hence approximation
takes a value TRUE
if the length of the series is greater than $n = 100$, or there are more than 12 observations within each year. As you simulated series with $n = 500$ but did not specify a value for the approximation
argument, you ran auto.arima()
with approximation = TRUE
. This explains the apparently erroneous selection of a model with larger AIC, AICc, and BIC than the simpler model you fitted with arima()
.
For your example 1, we should have
> auto.arima(y, approximation = FALSE)
Series: y
ARIMA(0,0,1) with non-zero mean
Coefficients:
ma1 intercept
0.7166 19.9844
s.e. 0.0301 0.0797
sigma^2 estimated as 1.079: log likelihood=-728.94
AIC=1463.87 AICc=1463.92 BIC=1476.52
> qa
Series: y
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1 ma1 intercept
0.0565 0.6890 19.9846
s.e. 0.0626 0.0456 0.0830
sigma^2 estimated as 1.078: log likelihood=-728.53
AIC=1465.06 AICc=1465.14 BIC=1481.92
Hence auto.arima()
has selected a more parsimonious model than the true model; an ARIMA(0, 0, 1) is chosen. But this is based on the information criteria and now they are in accordance; the selected model has lower AIC, AICc, and BIC, although the differences for AIC and AICc are small. At least now the selection is consistent with the norms for choosing models based on information criteria.
The reason for the MA(1) being chosen, I believe, relates to the first issue I mentioned; namely that the best fitting model to a sample drawn from a stated ARIMA(p, d, q) may not be of the same order as the true model. This is due to random sampling. Taking a longer series or a longer burn in period may help increase the chance that the true model is selected, but don't bank on it.
Regardless, the moral here is that when something looks obviously wrong, like in your question, do read the associated man page or documentation to assure yourself that you understand how the software works.
Best Answer
You cannot say that you choose a wrong model just because the forecasts are converging to a straight line. But it may be due to the fact that you didn't pick up the right order. Note that higher orders are much more difficult to estimate.
The search limits for the AR and MA parts most of the time comes from your own knowledge. But in the forecast package in R (that uses different criteria like "aicc","aic", "bic" to find the best model), they set by default max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5, max.d=2, max.D=1, where the capital letters (P, D, Q) refer to seasonal orders. However, as mentioned in the book R Cookbook by Paul Teetor, P. 384, if you think that your models needs more coefficients, then you need to expand the search limit. On the other hand, if you look at the book Introduction to Time Series and Forecasting (2nd Ed.) by Peter J. Brockwell, Richard A. Davis, P. 161, the maximum range for both p and q is from 0 to 27. Note that they use AICC rather than AIC. Therefore , to my knowledge, there is no universal agreement on these limits.
One last thing, in practice, you normally end up with some competitive models and not just one model based on one criteria. Then at the end of the day, you will again double check those models and pick up one of them to use.