Monte Carlo – How to Calculate Certainty in Monte Carlo Simulation

monte carlo

(Hi, sorry, this is probably a very entry level question for this site. Let me know if something is not OK.)

Let's say that we use the Monte Carlo method to estimate the area of an object, in the exact same way you'd use it to estimate the value of π.

Now, let's say we want to calculate the certainty of our simulation result. We've cast n samples (taken from uniform distribution of the sample area), m of which landed inside the object, so the area of the object is approximately m/n of the total sampled area. We would like to make a statement such as:

"We are 99% certain that the area of the object is between a₁ and a₂."

How can we calculate a₁ and a₂ above (given n, m, total area, and the desired certainty)?

I wrote a program which attempts to estimate this bound numerically. Here the samples are points in [0,1), and the object is the segment [0.25,0.75). It prints a₁ and a₂ for 50%, 90%, and 99%, for a range of sample counts:

import std.algorithm;
import std.random;
import std.range;
import std.stdio;

void main()
{
    foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
    {
        auto samples = new double[numSamples];
        enum objectStart = 0.25;
        enum objectEnd   = 0.75;

        enum numTotalSamples = 10_000_000;
        auto numSizes = numTotalSamples / numSamples;
        auto sizes = new double[numSizes];
        foreach (ref size; sizes)
        {
            size_t numHits;
            foreach (i; 0 .. numSamples)
            {
                auto sample = uniform01!double;
                if (sample >= objectStart && sample < objectEnd)
                    numHits++;
            }

            size = 1.0 / numSamples * numHits;
        }

        sizes.sort;
        writef("%d samples:", numSamples);
        foreach (certainty; [50, 90, 99])
        {
            auto centerDist = numSizes * certainty / 100 / 2;
            auto startPos = numSizes / 2 - centerDist;
            auto endPos   = numSizes / 2 + centerDist;
            writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
        }
        writeln;
    }
}

(Run it online.) It outputs:

//                     50%                 90%                 99%
100 samples:    0.47000..0.53000    0.42000..0.58000    0.37000..0.63000
200 samples:    0.47500..0.52500    0.44500..0.56000    0.41000..0.59000
300 samples:    0.48000..0.52000    0.45333..0.54667    0.42667..0.57333
400 samples:    0.48250..0.51750    0.46000..0.54250    0.43500..0.56500
500 samples:    0.48600..0.51600    0.46400..0.53800    0.44200..0.55800
600 samples:    0.48667..0.51333    0.46667..0.53333    0.44833..0.55167
700 samples:    0.48714..0.51286    0.46857..0.53143    0.45000..0.54857
800 samples:    0.48750..0.51250    0.47125..0.53000    0.45375..0.54625
900 samples:    0.48889..0.51111    0.47222..0.52667    0.45778..0.54111
1000 samples:   0.48900..0.51000    0.47400..0.52500    0.45800..0.53900

Is it possible to calculate these numbers directly instead?

(Context: I'd like to add something like "±X.Y GB with 99% certainty" to btdu)

Best Answer

Consider the following Monte Carlo approximation in R of $P(.25 \le U < .75) = 0.5,$ for $U\sim\mathsf{Unif}(0,1).$

set.seed(2021)                 # for reproducibility 
u = runif(10^6)                # 10^6 - vector of std unif values
event = (u >= .25)&(u < .75)   # logical 10^6 - vector
mean(event)                    # proportion of TRUEs
[1] 0.500772
1.96*sd(event)/10^3            # aprx 95% margin of simulation error
[1] 0.0009799993

The Law of Large Numbers guarantees the approach of the approximated value to the exact value $1/2$ as the number of iterations increases to infinity. In our particular case, we can use a well-known Wald asymptotic 95% confidence interval to find the approximate margin of simulation error. Specifically, for the $B = 10^6$ iterations shown, the margin of simulation error is about $0.00098$ so we can say with 95% confidence that the desired probability is $0.5008 \pm 0.0010.$

Here is a plot of estimated proportions p.hat(black) and corresponding Wald 95% CIs after each of the first 5000 of the million iterations. (CI's for $n < 1000$ should be taken as rough approximations.)

n = 1:5000
p.hat = cumsum(u[1:5000])/n
plot(n, p.hat, type="l")
 abline(h=.5, col="blue")
Up = p.hat + 1.96*sqrt(p.hat*(1-p.hat)/n)
Lw = p.hat - 1.96*sqrt(p.hat*(1-p.hat)/n)
 lines(Up, type="l", col="red")
 lines(Lw, type="l", col="red")

Addendum (per @whubers's Comments below): For large $n,$ say $n \ge 1000,$ the Wald intervals (illustrated in the figure above show that the estimate, $\hat p = X/n$ is near to $p = 1/2.$ So without simulation, one would have the 95% CI $\hat p \pm 1.96\sqrt{\frac{\hat p(1-\hat p)}{n}}$ for $p = 1/2.$ [These are the intervals for $n=1,2, \dots, 5000$ shown in red in the figure.] For smaller $n,$ a more accurate 95% Agresti-Coull CI uses point estimate $\check p = \frac{X+2}{n+4}$ to make the interval $\check p \pm 1.96\sqrt{\frac{\check p(1-\check p)}{n+4}}$ (not shown in the figure).

Notes:

(1) We assume that R code runif gives values that cannot, for practical purposes, be distinguished from IID standard uniform observations.

(2) Computer code should be commented.

(3) For reproducibility, the seed should be shown for a simulation.

(4) event is a logical vector of one million TRUEs and FALSEs; its 'mean' is the proportion of its TRUEs. [TRUE is taken as 1, and FALSE as 0; similarly for sd.]

(5) The Wald 95% asymptotic CI for a binomial proprotion is $\hat p \pm 1.96\sqrt{\frac{\hat p(1-\hat p)}{n}},$ where $X$ successes are observed among $n$ trials and $\hat p = X/n.$

Related Solutions

Solved – Monte Carlo simulation exercise

After you fix the problems in the comments with compound interest rate, you should write a VB script in Excel to do the simulation. In essence, you're going to replace the rand() column in each iteration, then log the final result. Usually with Monte Carlo simulations, you want to bin the result because you are looking at the confidence intervals, i.e. how much to they have at 95% confidence (95% chance they will have at least that much)., 90% ...

Estimation – How to Handle Non-Convergence in Markov Chain Monte Carlo

It is hard to tell if your problem is due to mistakes inyour implementation or if it is about the actual estimation problem. We don't have any working code to inspect. I implemented a working example for you -- see code below. I have also made an analysis of your problem, as described below.

The code implements Metropolis-Hastings with Brownian motion proposal. I tinkered quite some time with initialization and step size. The solution is very much dependant on these values. In the code, they are called u and step size. The values in the code currently comes from a kind of manual annealing burn in -- I started with a wild guess for u and large step_size. At the end of that run, I wrote down the final u, and restarted the whole process at that u, but with smaller step_size. And repeated again and again. In the early runs, the acceptance rate is ca 0.5%. By the end, it is ca 50%. Reducing the step size further makes the samples autocorrelated, so I stopped here with the parameters in the code below.

The reason why this kind of annealing scheme is needed is because of the nasty periodic structure of the likliehood with sharp local maxima. I plotted the log likelihood and the histogram of samples to illustrate that. The histogram is extremely narrow and quite hard to see in the plot using this axis limits, but quite nice to work with interactively.

If the step size is too large, it will propose in the low-likelihood regions too often and get rejected. If the step size is too small, it get stuck in one of the small "bumps". The periodicity of the plot is due to aliasing. Run the code with differenet initializations and zoom around, and I think you will get a nice feel for it. :) I also played around with different noise levels, sample sizes, prior parameters etc. The conclusions changes with that of course. But in this low noise setting I have in the code below, the prior don't matter at all.

N.B. The plot shows the log likeliehood, since we are working with really small likelihoods. The use of logarithms can be crucial to get numerical stability.

The code also produces a different diagnostic plot

that tells us that the burn in period is long enough, that the autocorrelation between samples (after burn-in) is low, and that when I pick the expected posterior parameter, and generate synthetic data with it, it looks quite okay.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(999)


#
# Generate data for the observed time indices
#
dt = 0.1
M = 5
N = 100
u_true = 0.01 # the true u-value, to be estimated
lamda = 1 # assumed known
noise_variance = 0.01 # assumed known
t = np.array([N*k*dt + i*dt for i in range(5) for k in range(10)])
n_data = len(t)
x = np.exp(u_true*1j*t*4/lamda)
n = (
        np.random.standard_normal(size=n_data) +
        1j*np.random.standard_normal(size=n_data)
    )*np.sqrt(noise_variance)/np.sqrt(2)
z = x + n

#
# Configure and Initialize Metropolis-Hastings
#
sigma2_u = 1 # the variance of the gaussian prior
mu_u = 0     # the mean of the gaussian prior
proposal_step_size = 0.0001  # adjusted in manual "annealing" process
u = 0.014                    # initialization
get_proposal = lambda u:  u+np.random.standard_normal()*proposal_step_size # proposal is brownian motion
burn_in_samples = 500
mcmc_steps = burn_in_samples+ 2000

def log_proposal_density_ratio(u_new,u):
    """The proposals density is symmetric, so the ratio is always 1, and the log(1) = 0 always"""
    return 0

def log_likeliehood(u):
    """ compute the log likeliehood of this parameter
    as in
        log(p(z,u)) = log(p(u)) + log(p(z|u))   
    """
    x = np.exp(u*1j*t*4/lamda) #compute the predicted data sequence under this u
    n = z - x                  # compute the observed noise, assuming this predicted sequence
    n_real_std = np.real(n) * np.sqrt(2) / np.sqrt(noise_variance) # this is a vector of standard-normal distributed values, under the model assumptions
    n_imag_std = np.imag(n) * np.sqrt(2) / np.sqrt(noise_variance)
    ll1 = -0.5*np.log(2*np.pi) - 0.5*n_real_std**2                 # this is a vector of log likliehoods per observation
    ll2 = -0.5*np.log(2*np.pi) - 0.5*n_imag_std**2
    log_p_z_given_u = ll1.sum() + ll2.sum()
    log_p_u = -0.5*np.log(2*np.pi*sigma2_u)-0.5*(u-mu_u)**2/sigma2_u
    return log_p_u+log_p_z_given_u


#
# Run Metropolis-Hastings Algorithm
#
ll = log_likeliehood(u)
us = np.zeros(mcmc_steps)
did_accept = np.zeros(mcmc_steps)
for step in range(mcmc_steps):
    u_new = get_proposal(u)
    ll_new = log_likeliehood(u_new)
    accept_ratio = np.exp(ll_new - ll + log_proposal_density_ratio(u_new,u))
    rand = np.random.random()
    if rand < accept_ratio:
        u = u_new
        ll = ll_new
        did_accept[step] = 1
    else:
        did_accept[step] = 0

    us[step] = u

#
# Do diagnostics on the results
#
print(f"After burn in, {did_accept.mean():.1%} of the proposals were accepted.")
print(f"There are {len(np.unique(us[burn_in_samples:]))} unique values in the kept samples")

fig,axs = plt.subplots(3,1,figsize=plt.figaspect(1))
axs=axs.flatten()
variation = us[burn_in_samples:]-us[burn_in_samples:].mean()
variation /= np.linalg.norm(variation)
corrs = np.correlate(variation,variation,mode='same')
axs[0].plot(corrs[len(corrs)//2:],label='correlogram of samples')
axs[0].legend()
axs[1].plot(np.arange(burn_in_samples),us[:burn_in_samples],color='C3',label='burn in')
axs[1].plot(np.arange(burn_in_samples,mcmc_steps),us[burn_in_samples:],color='C2',label='kept samples')
axs[1].legend()
axs[2].scatter(t,np.real(x),label='real(x)')
axs[2].scatter(t,np.real(z),label='real(z)')
axs[2].scatter(t,np.real(np.exp(us.mean()*1j*t*4/lamda)),label='real(estimated x)')
axs[2].legend()


#
# Explanation plot
#
urange = np.linspace(-0.05,0.2,500)
lls = np.array([log_likeliehood(u) for u in urange])
fig,axs = plt.subplots(2,1,sharex=True)
axs=axs.flatten()
axs[0].hist(us[burn_in_samples:],label='MCMC samples (after burn in)',alpha=0.7)
axs[0].hist(us[:burn_in_samples],label='MCMC samples (burn in)',alpha=0.5)
axs[0].axvline(us.mean(),label=f'MCMC mean ={us.mean():.3f}',color='C1',linestyle='dotted')
axs[0].axvline(u_true,label=f'true u = {u_true:.3f}',color='C2',linestyle='dotted')
axs[0].legend()
axs[1].set_ylabel("Log Likliehood")
axs[1].set_xlabel("u")
axs[1].plot(urange,lls)
plt.show()

In summary:

I can't tell from your description if you had any problems with the implementation. It seems from the discussion with Sextus Empiricus that there might be misundrestanding about how to compute the likeliehood.
Metropolis Hastings can be used on this problem to generate samples from the posterior, if appropriate step size is used and it is initialized around a suitable local optimum.
The use of logarithms may be critical, as log likelihoods can be as small as -2500 .
For my number of data points and noise levels, the prior didn't have any notice'able effect. It can matter in your case.

Best Answer

Related Solutions

Solved – Monte Carlo simulation exercise

Estimation – How to Handle Non-Convergence in Markov Chain Monte Carlo

Related Question