Solved – Data input uncertainty + Monte Carlo simulation + forecasting

Consider a variable $Y$ (e.g., temperature). Suppose that we were able to estimate this variable each year for the past $N$ years using some type of model. This means we have access to annual estimated values for Y (denoted as $Y_1, \ldots, Y_N$) and associated standard errors $S_1,\ldots,S_N$. The goal is to produce a point forecast for the value of $Y$ at the time N+1 which incorporates the uncertainty present in the estimated annual values $Y_1,\ldots,Y_N$.

One way to proceed is to use Monte Carlo simulation to create an ensemble of $B$=100,000 (or a large enough number) of synthetic time series obtained by shifting each of the original values $Y_1,\ldots,Y_N$ by a random z score (i.e., Gaussian white noise), scaled by the standard error $S_t$. (This approach assumes that $Y_1,\ldots,Y_N$ are independent.)

Each synthetic series can then be used to produce (I) a point forecast of $Y$ at time $N+1$ and (II) an interval forecast of $Y$ at time $N+1$.

My question is:

How do we summarize the information conveyed by the simulated point forecasts and interval forecasts to quantify the uncertainty present in $Y_1,\ldots,Y_N$ and its impact on the forecasting output?

For point forecasts, it makes sense to report the simulation distribution of the point forecasts. But what aspect of this distribution captures the uncertainty (e.g., spread)?

For interval forecasts, it is not clear (at least not to me) how to proceed. Is there any way to quantify uncertainty in the forecasting input (i.e., $Y_1,\ldots,Y_N$) when it comes to width and/or coverage of these intervals? (Maybe by using some type of retrospective performance of the prediction procedure?)

Best Answer

You have two sources of uncertainty: the uncertainty in the historical data, and the uncertainty in producing the forecasts given the historical data. The simulation distribution of point forecasts is capturing the uncertainty in the historical data only.

To capture the joint uncertainty, I suggest you simulate a future value from the forecast distribution for each of the synthetic time series. That is, for each synthetic time series compute the point forecast and the forecast variance, and then simulate a value from this distribution. These simulated future values then include the uncertainty in both the forecast distribution and in the historical data.

You could compute a prediction interval from the percentiles of the future values and compare its width with the size of the prediction interval produced for each synthetic series.

Best Answer

Related Solutions

Solved – Forecast models of differenced/undifferenced data

R – Real Future Value Out of 95% Predict Interval for ARIMA Forecast

Related Question