Regression – When to Apply Bootstrap to Time Series Models

bootstrapcointegrationregressionstationaritytime series

Under what circumstances can you apply re-sampling techniques to quantify the uncertainty about the parameters of a time series model?

Say that I have a model such as below:

$ Y_t = X_t\beta + e_t$

(where $X_t$ may include lags of Y$_t$)

I'd like to use repeated re-sampling ('the bootstrap') to generate distributions for the parameters of the model. I understand that it's generally something to be wary of in the time series case.

My question is: in what circumstances would this be valid?

I was thinking that it very likely makes sense in the case of stationary input variables. But what if i'm satisfied that there's a co-integrating relationship.

Is it valid in that case?

Best Answer

Before getting to my answer, I think I should point out that there is a mismatch between your question title and the body of the question. Bootstrapping time series is in general a very wide topic that must grapple with the various nuances of the particular model under consideration. When applied to the specific case of cointegrated time series, there are some methods that take just such care of the specific relationships between the collection of time series.

First, a quick review of relevant concepts so that we have a common starting point.

Stochastic Processes

The time series under consideration will be discrete-time stochastic processes. Recall that a stocastic process is a collection of random variables, with the discrete-time qualifier describing the cardinality of the index set. So we can write a time series as $\{X_{t}\}_{t\in \mathbb{N}}$, where each $X_{t}$ is a random variable and the index set is $\mathbb{N} = \{0, 1, 2, \dots\}$. A sample from such a time series consists of a sequence of observations $x_{0}, x_{1}, x_{2}, \dots$ such that $x_{i}$ is a realization of random variable $X_{i}$. This is a minimal, extremely general definition, so usually more structure is assumed to hold in order to bring to bear heavier machinery. The structure of interest is the joint distribution of the infinite series of random variables, and unless we are dealing with white noise, determining this joint distribution is where the work happens. Obviously, we also will in practice only have access to a finite length sample $x_{0}, x_{1}, \dots, x_{n}$, and models typically impose constraints that imply any underlying joint structure (hopefully) can be captured by such a finite sample. As you likely are aware, there are numerous models embodying the various functional forms these structural assumptions take; familiar ones like ARIMA, GARCH, VAR, and maybe less familiar ones (assuming the selected model is correctly specified) all try proceed by some kind of transformation or model fit to capture the regular structure, and whatever residual stochasticity is left between the fitted values and the observations can be modeled in a simple form (typically Gaussian).

Bootstrapping

The general idea of the bootstrap is to replace the theoretical distribution with the empirical distribution, and to use the observed data as if it consists of the theoretical population. Should certain conditions be met, which intuitively correspond to the data being 'representative' of the population, then resampling from the data can approximate sampling from the population.

In a basic formulation of the bootstrap, the data are assumed to be generated by an iid process - each sample is an independent draw from the same distribution. Given a data set $x_{1}, \dots, x_{n}$, we randomly resample with replacement a data set $x^*_{1}, \dots, x^*_{n}$, where each $x^*_{i}$ is an independent draw from the uniform distribution over $x_{1}, \dots, x_{n}$. In other words, each $x^*_{i}$ is an independent realization of the random variable $X^*$ which has a discrete uniform distribution over the observations, with a probability mass of $\frac{1}{n}$ on each data point $x_{i}$. Note how this mirrors the assumed sampling mechanism from the population, where each $x_{i}$ is an independent realization of the random variable $X$ which has the theoretical population distribution of interest. Hopefully laying everything out explicitly makes clear when the bootstrap makes sense: if your original sampling procedure consisted of iid draws from some fixed but unknown distribution, and each sample point is taken to reveal an equal amount of information about this distribution, then uniformly resampling from the data can reasonably replace sampling from the population. With these resamples you can do all the usual things, like estimating the distributions of model parameters and summary statistics, then using those distributions to perform inference.

Bootstrapping Time Series

Based on the above discussion, it should be clear that applying a basic bootstrap to time series data is in general a bad idea. The basic bootstrap above crucially depends on the initial sample consisting of iid draws from a fixed population distribution - which in general will not hold for various time series models. This issue is further exacerbated by model misspecification, which in practice should always be a consideration - hedge your bets.

Again, depending on the particular model assumed to hold, there are specific modifications to the basic bootstrapping procedure that are model aware and maybe even robust to misspecification. Which method you utilize will depend on first determining the model and considering consequences of misspecification. I'll describe a couple general methods for time series, and point to some sources for specific approaches to the cointegrated case.

One widely applied bootstrapping technique for time series is the block bootstrap. The underlying idea is that since the sequential nature of the sample $x_{0}, x_{1}, \dots, x_{n}$ encodes information of interest, we want our resampling procedure to capture this very sequential information. This idea is in the spirit of the basic bootstrap, as the resampling procedure tries to reflect the original sampling procedure. To perform a block bootstrap, you set some block size $\ell$, and split your data into contiguous blocks $x_{i}, x_{i+1}, \dots, x_{i + l - 1}$. You then perform resampling with replacement of the blocks of data in order to generate a bootstrapped sample, with a uniform distribution over all blocks. Here too, there are various nuances, depending on whether you allow your initial blocks to overlap or not, how you concatenate them, etc. One major point to observe about this class of methods is that while the blocks are contiguous, resampling effectively shuffles the order of the blocks. This implies that block bootstrapping retains local sequential dependence (within each block), but global sequential dependence is lost due to this shuffling. This is why block bootsrap methods may be a good choice when working with ARIMA, STL, or local regression models; as long as your block size $\ell$ has been chosen to capture the most important 'length' of the model (assuming it is correctly specified), then the shuffling of the blocks incurred by resampling shouldn't cause too much trouble. But you'll need to weigh the appropriateness based on your model, goal, and data, and still may need to experiment to determine the appropriate block size - assuming you have a long enough sample to accommodate the appropriate block size a large enough number of times in the first place. See [1] for some specific applications. If you are using R, the tsboot function in the boot package implements several variants of the block bootstrap.

Another type of bootstrapping applied to time series is a sieve bootstrap. The name comes from sieve estimators. Here again we try to have our resampling procedure emulate the original sampling method, but rather than resampling the data, we generate a new data set by using an AR model on the residuals, with individual residuals resampled using the empirical distribution over the observed residuals. The underlying AR model is assumed to be infinite order, but each resampling AR model is of finite order - though the order is allowed to grow at a rate determined by the sample size. This asymptotic increase in the order is the 'sieve' part of the name, as you get closer to the target model with increasing sample size. See [2] and [3] for an overview of the sieve bootstrap. The AR model is how we capture the sequential dependence structure in this case. Because new synthetic data are being simulated in a recursive manner, sieve bootstrap methods try to retain the global sequential dependence in the data - contrast this with the local properties of block bootstraps. This method may also be the one you want to apply for cointegrated time series, as there appear to be issues with resampling the data directly in the case of cointegrated time series [4]. See [5] for a specific application of sieve bootstrapping to cointegrated models. If you're using R, then the tseriesEntropy package has a surrogate.AR function which implements a sieve bootstrap.

There are other bootstrapping methods that can be applied to time series, and variations of the general methods mentioned - other methods to check out may be the stationary bootstrap and wild bootstrap. For a general overview of bootstrapping time series, see [6]. As mlofton mentioned, and I have hopefully illustrated, bootstrapping time series is a complex problem with various solutions designed for particular circumstances. Another reference by the authors MacKinnon and Davidson they mention which is informative can be found here [7].

Sorry I have avoided explicit mathematical formulations of techniques, but your question seemed to seek a somewhat intuitive explanation of what considerations determine appropriate methods for bootstrapping time series, and as I mentioned, the appropriateness of any particular technique depends on the specifics of your model, goals, and data. Hopefully the references will point you in the right direction.

References

  1. Petropoulos, F., Hyndman, R.J. and Bergmeir, C., 2018. Exploring the sources of uncertainty: Why does bagging for time series forecasting work?. European Journal of Operational Research, 268(2), pp.545-554.

  2. Bühlmann, P., 1997. Sieve bootstrap for time series. Bernoulli, 3(2), pp.123-148.

  3. Andrés, M.A., Peña, D. and Romo, J., 2002. Forecasting time series with sieve bootstrap. Journal of Statistical Planning and Inference, 100(1), pp.1-11.

  4. Li, H. and Maddala, G.S., 1997. Bootstrapping cointegrating regressions. Journal of Econometrics, 80(2), pp.297-318.

  5. Chang, Y., Park, J.Y. and Song, K., 2006. Bootstrapping cointegrating regressions. Journal of Econometrics, 133(2), pp.703-739.

  6. Bühlmann, P., 2002. Bootstraps for time series. Statistical science, pp.52-72.

  7. Davidson, R. and MacKinnon, J.G., 2006. Bootstrap methods in econometrics.

Related Question