Solved – How to make forecasts for a time series

forecastingtime series

I am not that familiar with the analysis of time series data. However, I have what I think is a simple prediction task to address.

I have about five years of data from a common generating process. Each year represents a monotonically increasing function with a non-linear component. I have counts for each week over a 40 week cycle for each year. The process starts, the function begins at zero, increases rather quickly over the first half of the function, slowing over the second half before leveling during the last five weeks. The process is consistent across years with small differences in rate of change and volume across the segments from year to year.

$$ y_{1}=\{0, N_{t1}, N_{t2}, … N_{t39}, N_{t40}\} $$

$$ \vdots $$

$$ y_{5}=\{0, N_{t1}, N_{t2}, … N_{t39}, N_{t40}\} $$

Where $N_{tx}$ equal the count at time x.

The goal is to take $N$ at $tx$ (or better $t0$ to $tx$, or the slope to that point) and predict the $N$ at $t40$. For example, if $N_{t10}$ is 5000 what is the expected value of $N_{t40}$ for that year. So, the question is, how would you model such data? It's easy enough to summarize and visualize. But I'd like a model to facilitate predictions and incorporate a measure of error.

Best Answer

Probably the simplest approach is, as Andy W suggested, to use a seasonal univariate time series model. If you use R, try either auto.arima() or ets() from the forecast package.

Either should work ok, but a general time series method does not use all the information provided. In particular, it seems that you know the shape of the curve in each year, so it might be better to use that information by modelling each year's data accordingly. What follows is a suggestion that tries to incorporate this information.

It sounds like some kind of sigmoidal curve will do the trick. e.g., a shifted logistic: \begin{equation} f_{t,j} = \frac{r_te^{a_t(j-b_t)}}{1+e^{a_t(j-b_t)}} \end{equation} for year $t$ and week $j$ where $a_t$, $b_t$ and $r_t$ are parameters to be estimated. $r_t$ is the asymptotic maximum, $a_t$ controls the rate of increase and $b_t$ is the mid-point when $f_{t,j}=r_t/2$. (Another parameter will be needed to allow the asymmetry you describe whereby the rate of increase up to time $b_t$ is faster than that after $b_t$. The simplest way to do this is to allow $a_t$ to take different values before and after time $b_t$.)

The parameters can be estimated using least squares for each year. The parameters each form time series: ${a_1,\dots,a_n}$, ${b_1,\dots,b_n}$ and ${r_1,\dots,r_n}$. These can be forecast using standard time series methods, although with $n=5$ you probably can't do much apart from using the mean of each series for producing forecasts. Then, for year 6, an estimate of the value at week $j$ is simply $\hat{f}(6,j)$ where the forecasts of $a_6$, $b_6$ and $r_6$ are used.

Once data start to be observed for year 6 you will want to update this estimate. As each new observation is obtained, estimate the sigmoidal curve to the data from year 6 (you will need at least three observations to start with as there are three parameters). Then take a weighted average of the forecasts obtained using the data up to year 5 and the forecast obtained using only the data from year 6, where the weights are equal to $(40-t)/36$ and $(t-4)/36$ respectively. That is very ad hoc, and I'm sure it can be made more objective by placing it in the context of a larger stochastic model. Nevertheless, it will probably work ok for your purposes.

Related Question