Time-Series – How to Perform Bootstrap Sampling with Size Greater than Original Sample for Volatility Forecasting in Monte Carlo and GARCH

bootstrapgarchmonte carlotime seriesvolatility-forecasting

I want to predict future returns over a 20 days horizon using an ARMA-GARCH model fitted to my data.
The goal is to estimate different risk measures like VaR or CVar.
In particular say I use AR(1) GARCH(1,1). The smaple I use for estimation has 500 observation of daily logreturns. That's what I usually do:

  1. Estimate AR, ARCH and GARCH coefficients
  2. Calculate standardized residual by dividing residuals by estimated conditional variances
  3. The standardized residuals constitute my INVARIANTS that is the i.i.d. series from which I extract bootstrap samples to generate scenarios.

The bootstrap samples are extracted by simulating a uniform between 1 and the sample size (500 in this case) and then taking the value corresponding to that position in the vector of the standardized residuals.

The problem is that I have only 500 standardized residuals and I think 500 is the maximum size of bootstrap samples I can extract.

My colleague instead extracts 100000 observations out of the original sample of $N=500$ observations.

I feel that this is somehow incorrect conceptually. Simulating only one step forward would produce exactly the same scenario as the initial one, but with repeted values that add no information.

My colleague claims that if he wants to project over a longer period, e.g. 20 days horizon, the 100000 extractions from the original sample of N=500 obs. would produce many different scenarios at the final horizon, providing a CDF that is smooth.
Actually thit is true because, although the values are simply repeated in the first step, after that they can sum up in many different ways.

That being said I don't feel this is right. I proposed an alternative that is:

  1. From the standardized residuals create a smoothed empirical CDF, say kernel
  2. Exctract uniforms between 0 and 1 and feed it to the empirical smoothed CDF, i.e. inverse transform.

This way I feel more confortable to say that I can generate a bootstrab sample of size greater than the original one, but still I am not sure.

I am studying bootstrap theory on a book from Efron

Efron, Tibshirani – An Introduction to the Bootstrap – Springer US (1993)

but there are many concepts that I don't understand yet.

My question are:

  1. Would you give your opinion on the problem I just showed below?
  2. Would you suggest any valid matherial for studying bootstrap other than the book I mentioned?
  3. I think that this application of bootstrap is somehow different to the one explained in Efron's bookm, that is evaluating confidence intervals for estimated paramenters. What do you think about it?

Any comment would be much appreciated

I apologize for the length of the post but I tried to be as much concise as I could. Thank you

Best Answer

The objective of bootstrapping is (usually) to get some idea of the distribution of the parameter estimate(s). Since the parameter estimates were formed on the basis of a sample of size $N$, their distribution is conditional upon that sample size. Resampling to larger or smaller sample sizes will, consequently. give a more distorted view of the distribution of the parameter estimates than resampling with a sample size of $N$.

In this case, however, you are not actually performing the Efron bootstrap. You are simply generating simulated values of the sample path based upon the 500 estimated errors. Consequently, the issue with whether or not you can generate more than 500 such sample paths is moot; you can, as Johan points out, generate as many as you want.

Since you are basing all your results on the one set of initial parameter estimates, the sample paths are conditional upon that set being correct. The variability in the end result does not take into account parameter uncertainty, and it is this additional variability that the Efron bootstrap is designed to help with. A process that incorporates the bootstrap might be:

  1. Select a sample (with replacement) of 500 values from the initial set of standardized residuals (this 500 is the "500" that gave you so much trouble in your thinking about the problem and that Efron refers to in the book,)
  2. Calculate a simulated version of the original series using those standardized residuals and your initial parameter estimates,
  3. Re-estimate the parameters using the simulated version of the original series,
  4. Use the standardized residuals from the re-estimated parameters and the original data to generate some (smallish) number $M$ of future sample paths,
  5. If you've generated enough overall sample paths, exit, else go to 1.

Steps 1 through 3 are where the Efron bootstrap comes into play. Step 4 is the simulation as it is currently performed. Note that at each iteration you are generating new standardized residuals for use in the simulator; this will lessen the dependence of the results on the initial set of parameter estimates / standardized residuals and take into account, to some extent, the inaccuracy in the parameter estimates themselves.

If you generate $K$ bootstrap estimates in steps 1 and 2, you will have generated $KM$ total sample paths at the end of the exercise. How you should divide those between $K$ and $M$ depends to some extent on the various computational burdens involved but also upon how the contributions to randomness are split between parameter estimation error and sample path variability. As a general rule, the more accurate your parameter estimates are, the smaller $K$ can be; conversely, the less the sample paths vary for a given value of the parameter estimates, the smaller $M$ can be.

Related Question