Bootstrap – Comparing Bootstrap VS Monte Carlo for Error Estimation

I'm reading the article Error propagation by the Monte Carlo method in geochemical calculations, Anderson (1976) and there's something I don't quite understand.

Consider some measured data $\{A\pm\sigma_A, B\pm\sigma_B, C\pm\sigma_C\}$ and a program that processes it and returns a given value. In the article, this program is used to first obtain the best value using the means of the data (ie: $\{A, B, C\}$).

The author then uses a Monte Carlo method to assign an uncertainty to this best value, by varying the input parameters within their uncertainty limits (given by a Gaussian distribution with means $\{A, B, C\}$ and standard deviations $\{\sigma_A, \sigma_B, \sigma_C\}$) before feeding them to the program. This is illustrated in the figure below:

(Copyright: ScienceDirect)

where the uncertainty can be obtained from the final $Z$ distribution.

What would happen if, instead of this Monte Carlo method, I applied a bootstrap method? Something like this:

This is: instead of varying the data within their uncertainties before feeding it to the program, I sample with replacement from them.

What are the differences between these two methods in this case? What caveats should I be aware of before applying any of them?

I'm aware of this question Bootstrap, Monte Carlo, but it doesn't quite solve my doubt since, in this case, the data contains assigned uncertainties.

Best Answer

As far as I understand your question, the difference between the "Monte Carlo" approach and the bootstrap approach is essentially the difference between parametric and non-parametric statistics.

In the parametric framework, one knows exactly how the data $x_1,\ldots,x_N$ is generated, that is, given the parameters of the model ($A$, $\sigma_A$, &tc. in your description), you can produce new realisations of such datasets, and from them new realisations of your statistical procedure (or "output"). It is thus possible to describe entirely and exactly the probability distribution of the output $Z$, either by mathematical derivations or by a Monte Carlo experiment returning a sample of arbitrary size from this distribution.

In the non-parametric framework, one does not wish to make such assumptions on the data and thus uses the data and only the data to estimate its distribution, $F$. The bootstrap is such an approach in that the unknown distribution is estimated by the empirical distribution $\hat F$ made by setting a probability weight of $1/n$ on each point of the sample (in the simplest case when the data is iid). Using this empirical distribution $\hat F$ as a replacement for the true distribution $F$, one can derive by Monte Carlo simulations the estimated distribution of the output $Z$.

Thus, the main difference between both approaches is whether or not one makes this parametric assumption about the distribution of the data.

Best Answer

Related Solutions

Solved – Monte carlo optimisation (find maximum of function with multiple parameters)

Solved – Propagation of error in log ratios

Related Question