Solved – Forecasting Poisson, accuracy and prediction intervals

count-dataforecastingintermittent time seriesprediction intervaltime series

I'm trying to forecast Poisson data, divided in groups, of 1-26 months of data, depending on the group. Of the pooled data 65% has a value of 0 and 25% a value of 1. I couldn't find any trends or seasonality, so I started to test a couple of different stationairy models. Moving average (3), Moving Average (6), Simple Exponential Smoothing, Naïve and Simple Mean.

I need to forecast 1-6 months ahead and used MAD, MSE and RMSE to test the accuracy of the models. It looks like the most accurate is Simple Mean, with an RMSE of 1 and an MAD of 0,638. I think this is really high but I have no clue how to do anything about this.

Are there forecasting methods I didn't think about that could be way better? Am I over-looking something?

The only thing I was able to find about prediction intervals was F+ts and F-ts with F as forecast, t as t distribution with alfa (n-2) and s as standard deviation. It don't think it was a really trustworthly source but since I wasn't able to find anything else, I'm not sure about how to set up those prediction intervals. Is this method right?

I don't have R to use. I need to do it myself.

Best Answer

You have what is called intermittent demand, that is, a demand time series characterized by "many" zeros. (If your time series is not demand per se, most of what follows will still apply.) So a web search for "forecasting intermittent demand" would already be helpful. Teunter and Duncan (2009, JORS) give an overview of intermittent demand forecasting methods.

The standard method of forecasting intermittent demands is Croston's method. Use exponential smoothing on inter-demand intervals and on nonzero demand sizes separately. The point forecast then is the ratio of the smoothed nonzero demand to the smoothed inter-demand interval. Syntetos and Boylan (2001, IJPE) note that Croston is slightly biased and propose a modification, but this usually doesn't make all that much of a difference in practice.

An alternative is integer autoregressive moving average models (INARMA), which modify the standard ARIMA time series models. Maryam Mohammadipour wrote a thesis on these.

I personally have major doubts about the usefulness of such an expectation point forecast. A time series of 1 demand every other time period has an expectation of 0.5... as does a time series of 2 demands every fourth time period... and so forth - although these are, of course, less and less Poisson-y. I'd argue that it's much more useful to understand the entire future (and predictive) distribution of demands. So I applaud your looking for prediction intervals!

However, the $\alpha(n-2)$ formula you found applies only to single exponential smoothing on continuous data, via the ARIMA model SES is optimal for. So it is inapplicable to count data. I'd much rather propose that you take your point prediction $\hat{y}$ and use quantiles of the Poisson distribution with parameter $\lambda=\hat{y}$. This still disregards parameter estimation uncertainty (along with model selection uncertainty etc.), but it's a simple possibility and likely better than the formula you have.

Shenstone and Hyndman (2005, JoF) note that there is no consistent stochastic model for which Croston's method would be optimal - all candidate models are (1) continuous, not discrete, and (2) can yield negative values. However, for those candidate models, Shenstone and Hyndman provide prediction intervals.

Finally, a word of caution: don't use the MAD for assessing the accuracy of count data forecasts, especially not for intermittent demands. The expected MAD is minimized by the median of your future distribution, not its mean, and if you write that 65% of your data are zeros, then the median is zero... implying that you will probably get the lowest MAD by a flat zero forecast, which is badly biased and likely useless. Here is a presentation I gave at last year's International Symposium on Forecasting on this issue. Or look at Morlidge (2015, Foresight).

Final piece of shameless self-promotion: I have an article in the IJF (Kolassa, 2016) which looks at forecasting low volume count data (mostly intermittent), at different accuracy measures and different forecasting methods, including various flavors of Poisson models. This may be useful to you.

Related Solutions

Forecasting – Why R Squared is Not Suitable for Measuring Forecast Accuracy

In-sample $R^2$ is not a suitable measure of forecast accuracy because it does not account for overfitting. It is always possible to build a complicated model that will fit the data perfectly in sample but there are no guarantees such a model would perform decently out of sample.

Out-of-sample $R^2$, i.e. the squared correlation between the forecasts and the actual values, is deficient in that it does not account for bias in forecasts.

For example, consider realized values

$$y_{t+1},\dotsc,y_{t+m}$$

and two competing forecasts:

$$\hat{y}_{t+1},\dotsc,\hat{y}_{t+m}$$

and

$$\tilde{y}_{t+1},\dotsc,\tilde{y}_{t+m}.$$

Now assume that

$$\tilde{y}_{t+i}=c+\hat{y}_{t+i}$$

for every $i$, where $c$ is a constant. That is, the forecasts are the same except that the second one is higher by $c$. These two forecasts will generally have different MSE, MAPE etc. but the $R^2$ will be the same.

Consider an extreme case: the first forecast is perfect, i.e. $\hat{y}_{t+i}=y_{t+i}$ for every $i$. The $R^2$ of this forecast will be 1 (which is very good). However, the $R^2$ of the other forecast will also be 1 even though the forecast is biased by $c$ for every $i$.

Solved – Which formula does the forecast package in R use to calculate variance/ standard error for prediction intervals

You must use the standard deviation of the prediction errors, rather than the standard deviation of the residuals. The former varies for each forecast period.

For example, in an AR(1) model

$$y_t=\phi y_{t-1} + \varepsilon_t \,, \quad \varepsilon_t\sim NID(0, \sigma^2) \,, \quad t=1,\dots,T\,,$$

the variance of the prediction errors can be obtained as follows.

The h-steps ahead prediction errors are defined as $e(h) = y_{T+h} - E(y_{T+h})$ (the value determined by the model minus its expectation):

$$ \begin{align} e(1) &= y_{T+1} - E(y_{T+1}) = (\phi y_T+\varepsilon_{T+1}) - \phi y_T = \varepsilon_{T+1} \\ e(2) &= y_{T+2} - E(y_{T+2}) = (\phi \underbrace{y_{T+1}}_{\phi y_T+\varepsilon_{T+1}}+\varepsilon_{T+2}) - E(y_{T+2}) \\ &= \underbrace{(\phi^2y_T+\phi \varepsilon_{T+1}+\varepsilon_{T+2})}_{y_{T+2}} - \phi^2y_T) \\ &= \phi \varepsilon_{T+1} + \varepsilon_{T+2} \\ e(3) &= \underbrace{\phi^3y_T+\phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+1} + \varepsilon_{T+3}}_{y_{T+3}} - E(y_{T+3}) \\ &= \phi^3y_T+\phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+1} + \varepsilon_{T+3} - \phi^3y_T = \phi^2\varepsilon_{T+1}+\phi \varepsilon_{T+2} + \varepsilon_{T+3} \\ \cdots & = \cdots \\ e(h) &= \phi^{h-1}\varepsilon_{T+1}+\dots+\phi \varepsilon_{T+h-1} + \varepsilon_{T+h} \end{align} $$

The variances are straightforward to obtain (just note that $Var(\phi^k \varepsilon_t)=\phi^{2k}\sigma^2$):

$$ \begin{align} Var(e(1)) &= \sigma^2 \\ Var(e(2)) &= \sigma^2(1+\phi^2) \\ Var(e(3)) &= \sigma^2(1+\phi^2+\phi^4) \\ \dots &= \dots \end{align} $$

The expressions of the variances vary with the order of the model. A common way to compute the variances of the prediction errors in a general ARMA(p,q) is by means of the Kalman filter. Here is some code for illustration.

These are your data:

x <- ts(c(319.32, 320.36, 320.82, 322.06, 322.17,321.95,321.20,318.81,317.82, 317.37, 318.93, 319.09, 319.94, 320.98, 321.81, 323.03,323.36, 323.11, 321.65, 319.64, 317.86, 317.25, 319.06, 320.26, 321.65, 321.81,322.36, 323.67, 324.17, 323.39, 321.93, 320.29, 318.58, 318.60,319.98, 321.25, 321.88, 322.47, 323.17, 324.23, 324.88, 324.75, 323.47, 321.34,319.56, 319.45, 320.45, 321.92, 323.40, 324.21, 325.33, 326.31,327.01, 326.24, 325.37, 323.12, 321.85, 321.31, 322.31, 323.72, 324.60, 325.57,326.55, 327.80, 327.80, 327.54, 326.28, 324.63, 323.12, 323.11, 323.99, 325.09, 326.12, 326.61, 327.16, 327.92, 329.14, 328.80, 327.52,325.62, 323.61, 323.80, 325.10, 326.25, 326.93, 327.83, 327.95,329.91, 330.22, 329.25, 328.11, 326.39, 324.97, 325.32, 326.54, 327.71, 328.73, 329.69, 330.47, 331.69, 332.65, 332.24, 331.03, 329.36, 327.60, 327.29, 328.28, 328.79, 329.45, 330.89, 331.63, 332.85, 333.28, 332.47, 331.34, 329.53, 327.57, 327.57, 328.53, 329.69, 330.45, 330.97, 331.64, 332.87, 333.61, 333.55, 331.90, 330.05, 328.58, 328.31, 329.41, 330.63, 331.63, 332.46, 333.36, 334.45, 334.82, 334.32, 333.05, 330.87, 329.24, 328.87, 330.18, 331.50, 332.81, 333.23, 334.55, 335.82, 336.44, 335.99, 334.65, 332.41, 331.32, 330.73, 332.05, 333.53, 334.66, 335.07, 336.33, 337.39, 337.65, 337.57, 336.25, 334.39, 332.44, 332.25, 333.59, 334.76, 335.89, 336.44, 337.63, 338.54, 339.06, 338.95, 337.41, 335.71, 333.68, 333.69, 335.05, 336.53, 337.81, 338.16, 339.88, 340.57, 341.19, 340.87, 339.25, 337.19, 335.49, 336.63, 337.74, 338.36), frequency=12, start=c(1965,1))

These are the results from forecast that you want to reproduce:

require("forecast")
fit <- Arima(x, order = c(1,1,0), seasonal = list(order =c(0,1,1), period =12))
forecast(fit, h=4)
     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
# Jan 1981       339.5735 339.1483 339.9987 338.9232 340.2238
# Feb 1981       340.1759 339.6493 340.7025 339.3706 340.9813
# Mar 1981       341.2141 340.5865 341.8417 340.2543 342.1739
# Apr 1981       342.3020 341.5914 343.0125 341.2153 343.3886

The state space representation of the fitted model is available in fit$model, which is the input required to run the Kalman filter.

kf <- KalmanForecast(n.ahead = 4, fit$model)
pred.vars <- kf$var * fit$sigma2
kf$pred - 1.96 * sqrt(pred.vars)
# [1] 338.9232 339.3706 340.2543 341.2153
kf$pred + 1.96 * sqrt(pred.vars)
# [1] 340.2238 340.9813 342.1740 343.3886

The variances of prediction errors returned by the Kalman filter are scaled by the estimated variance of the residuals, $\sigma^2$. Then, the confidence intervals are obtained as you did (you can see that these values match the columns Lo 95 and Hi 95 obtained above).

Edit This may depart from your question, but as mentioned by @IrishStat, if the assumption of Gaussian innovations is not plausible, it is worth considering some alternative methods. For example, the option bootstrap=TRUE computes bootstrapped confidence intervals. After a quick view to the documentation, I think this option resamples the residuals and generates an ensemble of replicates based on the parameters of the fitted model. Although the distribution of the residuals of your fitted model depart from the Gaussian distribution (due to excess of kurtosis as suggested for example by the histogram), graphically there are no major differences in the confidence bands. Here are the numbers:

set.seed(1)
forecast(fit, h=4, bootstrap=TRUE)
#          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
# Jan 1981       339.5735 339.2259 339.9744 338.9472 340.2538
# Feb 1981       340.1759 339.7288 340.6966 339.4446 341.0239
# Mar 1981       341.2141 340.6837 341.8561 340.3736 342.2230
# Apr 1981       342.3020 341.6879 343.0394 341.3340 343.4285

Aside note: I think your question is more related to theoretical issues rather than on practical issues and your are using this model and data mainly for illustration. However, as preparing this answer I had a closer look to your data, I would suggest an ARIMA(0,1,1)(0,1,1) model with a temporary change at observation 190 (october 1980). I don't claim it is the best model or that it will perform better for forecasting, but according to the diagnostic of the residuals, it is better than the ARIMA(1,1,0)(0,1,1).

Best Answer

Related Solutions

Forecasting – Why R Squared is Not Suitable for Measuring Forecast Accuracy

Solved – Which formula does the forecast package in R use to calculate variance/ standard error for prediction intervals

Related Question