You must use the standard deviation of the prediction errors, rather than the standard deviation of the residuals. The former varies for each forecast period.
For example, in an AR(1) model
$$y_t=\phi y_{t-1} + \varepsilon_t \,, \quad \varepsilon_t\sim NID(0, \sigma^2) \,, \quad t=1,\dots,T\,,$$
the variance of the prediction errors can be obtained as follows.
The h-steps ahead prediction errors are defined as $e(h) = y_{T+h} - E(y_{T+h})$ (the value determined by the model minus its expectation):
$$
\begin{align}
e(1) &= y_{T+1} - E(y_{T+1}) = (\phi y_T+\varepsilon_{T+1}) - \phi y_T = \varepsilon_{T+1} \\
e(2) &= y_{T+2} - E(y_{T+2}) = (\phi \underbrace{y_{T+1}}_{\phi y_T+\varepsilon_{T+1}}+\varepsilon_{T+2}) - E(y_{T+2}) \\
&= \underbrace{(\phi^2y_T+\phi \varepsilon_{T+1}+\varepsilon_{T+2})}_{y_{T+2}} - \phi^2y_T) \\
&= \phi \varepsilon_{T+1} + \varepsilon_{T+2} \\
e(3) &= \underbrace{\phi^3y_T+\phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+1} + \varepsilon_{T+3}}_{y_{T+3}} - E(y_{T+3}) \\
&= \phi^3y_T+\phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+1} + \varepsilon_{T+3} - \phi^3y_T = \phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+2} + \varepsilon_{T+3} \\
\cdots & = \cdots \\
e(h) &= \phi^{h-1}\varepsilon_{T+1}+\dots+\phi \varepsilon_{T+h-1} + \varepsilon_{T+h}
\end{align}
$$
The variances are straightforward to obtain (just note that $Var(\phi^k \varepsilon_t)=\phi^{2k}\sigma^2$):
$$
\begin{align}
Var(e(1)) &= \sigma^2 \\
Var(e(2)) &= \sigma^2(1+\phi^2) \\
Var(e(3)) &= \sigma^2(1+\phi^2+\phi^4) \\
\dots &= \dots
\end{align}
$$
The expressions of the variances vary with the order of the model.
A common way to compute the variances of the prediction errors in a general ARMA(p,q) is by means of the Kalman filter. Here is some code for illustration.
These are your data:
x <- ts(c(319.32, 320.36, 320.82, 322.06, 322.17,321.95,321.20,318.81,317.82, 317.37, 318.93, 319.09, 319.94, 320.98, 321.81, 323.03,323.36, 323.11, 321.65, 319.64, 317.86, 317.25, 319.06, 320.26, 321.65, 321.81,322.36, 323.67, 324.17, 323.39, 321.93, 320.29, 318.58, 318.60,319.98, 321.25, 321.88, 322.47, 323.17, 324.23, 324.88, 324.75, 323.47, 321.34,319.56, 319.45, 320.45, 321.92, 323.40, 324.21, 325.33, 326.31,327.01, 326.24, 325.37, 323.12, 321.85, 321.31, 322.31, 323.72, 324.60, 325.57,326.55, 327.80, 327.80, 327.54, 326.28, 324.63, 323.12, 323.11, 323.99, 325.09, 326.12, 326.61, 327.16, 327.92, 329.14, 328.80, 327.52,325.62, 323.61, 323.80, 325.10, 326.25, 326.93, 327.83, 327.95,329.91, 330.22, 329.25, 328.11, 326.39, 324.97, 325.32, 326.54, 327.71, 328.73, 329.69, 330.47, 331.69, 332.65, 332.24, 331.03, 329.36, 327.60, 327.29, 328.28, 328.79, 329.45, 330.89, 331.63, 332.85, 333.28, 332.47, 331.34, 329.53, 327.57, 327.57, 328.53, 329.69, 330.45, 330.97, 331.64, 332.87, 333.61, 333.55, 331.90, 330.05, 328.58, 328.31, 329.41, 330.63, 331.63, 332.46, 333.36, 334.45, 334.82, 334.32, 333.05, 330.87, 329.24, 328.87, 330.18, 331.50, 332.81, 333.23, 334.55, 335.82, 336.44, 335.99, 334.65, 332.41, 331.32, 330.73, 332.05, 333.53, 334.66, 335.07, 336.33, 337.39, 337.65, 337.57, 336.25, 334.39, 332.44, 332.25, 333.59, 334.76, 335.89, 336.44, 337.63, 338.54, 339.06, 338.95, 337.41, 335.71, 333.68, 333.69, 335.05, 336.53, 337.81, 338.16, 339.88, 340.57, 341.19, 340.87, 339.25, 337.19, 335.49, 336.63, 337.74, 338.36), frequency=12, start=c(1965,1))
These are the results from forecast
that you want to reproduce:
require("forecast")
fit <- Arima(x, order = c(1,1,0), seasonal = list(order =c(0,1,1), period =12))
forecast(fit, h=4)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
# Jan 1981 339.5735 339.1483 339.9987 338.9232 340.2238
# Feb 1981 340.1759 339.6493 340.7025 339.3706 340.9813
# Mar 1981 341.2141 340.5865 341.8417 340.2543 342.1739
# Apr 1981 342.3020 341.5914 343.0125 341.2153 343.3886
The state space representation of the fitted model is available in fit$model
, which is the input required to run the Kalman filter.
kf <- KalmanForecast(n.ahead = 4, fit$model)
pred.vars <- kf$var * fit$sigma2
kf$pred - 1.96 * sqrt(pred.vars)
# [1] 338.9232 339.3706 340.2543 341.2153
kf$pred + 1.96 * sqrt(pred.vars)
# [1] 340.2238 340.9813 342.1740 343.3886
The variances of prediction errors returned by the Kalman filter are scaled by the estimated variance of the residuals, $\sigma^2$. Then, the confidence intervals are obtained as you did (you can see that these values match the columns Lo 95
and Hi 95
obtained above).
Edit
This may depart from your question, but as mentioned by @IrishStat, if the assumption of Gaussian innovations is not plausible, it is worth considering some alternative methods. For example, the option bootstrap=TRUE
computes bootstrapped confidence intervals. After a quick view to the documentation, I think this option resamples the residuals and generates an ensemble of replicates based on the parameters of the fitted model. Although the distribution of the residuals of your fitted model depart from the Gaussian distribution (due to excess of kurtosis as suggested for example by the histogram), graphically there are no major differences in the confidence bands. Here are the numbers:
set.seed(1)
forecast(fit, h=4, bootstrap=TRUE)
# Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
# Jan 1981 339.5735 339.2259 339.9744 338.9472 340.2538
# Feb 1981 340.1759 339.7288 340.6966 339.4446 341.0239
# Mar 1981 341.2141 340.6837 341.8561 340.3736 342.2230
# Apr 1981 342.3020 341.6879 343.0394 341.3340 343.4285
Aside note: I think your question is more related to theoretical issues
rather than on practical issues and your are using this model and data
mainly for illustration. However, as preparing this answer I had a closer look to your data, I would suggest an ARIMA(0,1,1)(0,1,1) model with a temporary change at observation 190 (october 1980). I don't claim it is the best model or that it will perform better for forecasting, but according to the diagnostic of the residuals, it is better than the ARIMA(1,1,0)(0,1,1).
Best Answer
You have what is called intermittent demand, that is, a demand time series characterized by "many" zeros. (If your time series is not demand per se, most of what follows will still apply.) So a web search for "forecasting intermittent demand" would already be helpful. Teunter and Duncan (2009, JORS) give an overview of intermittent demand forecasting methods.
The standard method of forecasting intermittent demands is Croston's method. Use exponential smoothing on inter-demand intervals and on nonzero demand sizes separately. The point forecast then is the ratio of the smoothed nonzero demand to the smoothed inter-demand interval. Syntetos and Boylan (2001, IJPE) note that Croston is slightly biased and propose a modification, but this usually doesn't make all that much of a difference in practice.
An alternative is integer autoregressive moving average models (INARMA), which modify the standard ARIMA time series models. Maryam Mohammadipour wrote a thesis on these.
I personally have major doubts about the usefulness of such an expectation point forecast. A time series of 1 demand every other time period has an expectation of 0.5... as does a time series of 2 demands every fourth time period... and so forth - although these are, of course, less and less Poisson-y. I'd argue that it's much more useful to understand the entire future (and predictive) distribution of demands. So I applaud your looking for prediction intervals!
However, the $\alpha(n-2)$ formula you found applies only to single exponential smoothing on continuous data, via the ARIMA model SES is optimal for. So it is inapplicable to count data. I'd much rather propose that you take your point prediction $\hat{y}$ and use quantiles of the Poisson distribution with parameter $\lambda=\hat{y}$. This still disregards parameter estimation uncertainty (along with model selection uncertainty etc.), but it's a simple possibility and likely better than the formula you have.
Shenstone and Hyndman (2005, JoF) note that there is no consistent stochastic model for which Croston's method would be optimal - all candidate models are (1) continuous, not discrete, and (2) can yield negative values. However, for those candidate models, Shenstone and Hyndman provide prediction intervals.
Finally, a word of caution: don't use the MAD for assessing the accuracy of count data forecasts, especially not for intermittent demands. The expected MAD is minimized by the median of your future distribution, not its mean, and if you write that 65% of your data are zeros, then the median is zero... implying that you will probably get the lowest MAD by a flat zero forecast, which is badly biased and likely useless. Here is a presentation I gave at last year's International Symposium on Forecasting on this issue. Or look at Morlidge (2015, Foresight).
Final piece of shameless self-promotion: I have an article in the IJF (Kolassa, 2016) which looks at forecasting low volume count data (mostly intermittent), at different accuracy measures and different forecasting methods, including various flavors of Poisson models. This may be useful to you.