Monte Carlo Estimation – How to Estimate Probabilities Using Monte Carlo Simulation

monte carlosimulation

I would appreciate some advice on how to use Monte Carlo for estimating probabilities.

Generally speaking the problem I have involves running an experiment and counting the frequency of output (which depends on random inputs), and then using the relative frequency to estimate the probability of an output. However, the starting conditions is also randomly generated.

So I am repeatedly doing this:

  1. randomly generate starting conditions according to set rules,
  2. randomly generate input,
  3. record output.

My questions are:

  1. Is this a reasonable method?
  2. How can I estimate errors for the probability estimate?

Any help appreciated; I am new to this type of thing!

Best Answer

I think it is a reasonable method. The difficulty is usually how to select the number of trials (which affects simulation time).

If you conduct $n$ trials, of which $N \leq n$ give a "positive" result, you can estimate the probability $p$ of positive result as its relative frequency,

$$\hat p = N/n.$$

This estimator is unbiased and has mean-square error, or variance,

$$\mathrm E[(\hat p-p)^2] = \mathrm E[(N/n-p)^2] = \frac{p(1-p)}{n},$$

as follows from noting that $N$ is a binomial random variable with parameters $n$, $p$. The root-mean-square (RMS) error, or standard deviation, is just the square root of this.

In order to assess whether an RMS error value is acceptable or not, that value should be compared with the true $p$. For example, an RMS of $0.01$ may be acceptable for estimating a $p=0.1$, but not for $p=0.001$ (the error would be ten times the true value). The usual approach for this is to normalize the error by dividing it by $p$. So the normalized RMS error is

$$\frac{\sqrt{\mathrm E[(\hat p-p)^2]}}{p} = \sqrt\frac{1-p}{np}.$$

This can be approximated as $1/\sqrt{np}$ for $p$ small. As you can see, to maintain a certain normalized error you need to simulate a number of times $n$ inversely proportional to $p$. But $p$ is unknown, so it's difficult to know which $n$ you need.

A solution is to use sequential estimation, in which $n$ is not fixed in advance, but is adaptively selected according to a certain stopping rule to guarantee that the estimation error does no exceed a predefined level. A standard sequential method is inverse binomial sampling (also called negative-binomial Monte Carlo), which consists in the following: continue simulating until a target number $N$ of positive results is achieved. So now $N$ is fixed, and the number of trials, $n$, becomes the random variable.

The nice aspect of this approach is that by selecting the target $N$ you can control the normalized error level that will be achieved, irrespective of the unknown $p$. That is, to each $N$ corresponds a certain guaranteed value of normalized error. This works whether you consider error defined as mean-square-error, mean-absolute-error, or in terms of a confidence interval. A description of the procedure in each case is given in the following papers (please bear with my self-referencing):

Related Question