I am doing Monte Carlo simulations and I want to investigate the convergence. Two versions come to my mind:
1) Doing every trial independently:
For each trial, I generate new data independent from any previous data. Each trial however, we use more simulations, hence decreasing the confidence interval.
import numpy as np
import matplotlib.pyplot as plt
N = 50
sims = np.logspace(1, 3, N).astype(int)
means = np.zeros(N)
sstds = np.zeros(N)
for i, sim in enumerate(sims):
x = np.random.geometric(0.01, size=sim)
means[i] = np.mean(x)
sstds[i] = np.std(x, ddof=1)
cis = 1.96 * sstds / np.sqrt(sims)
plt.figure(figsize=(15,5))
plt.plot(sims, means)
plt.fill_between(sims, means+cis, means-cis, alpha=0.4)
plt.xscale("log")
plt.hlines(y=100, xmin=sims[0], xmax=sims[-1], color="red")
Now the output is a bit more spiky.
2) Reusing the previous trial in the next trial
Instead of doing a new indepndent simulation, I no progressively add simulations, so the mean of this single simulation I do is converging.
N = 50
sims = np.logspace(1, 3, N).astype(int)
means = np.zeros(N)
sstds = np.zeros(N)
x = np.random.geometric(0.01, size=10**3)
for i, sim in enumerate(sims):
means[i] = np.mean(x[:sim])
sstds[i] = np.std(x[:sim], ddof=1)
cis = 1.96 * sstds / np.sqrt(sims)
plt.figure(figsize=(15,5))
plt.plot(sims, means)
plt.fill_between(sims, means+cis, means-cis, alpha=0.4)
plt.xscale("log")
plt.hlines(y=100, xmin=sims[0], xmax=sims[-1], color="red")
Now the plot looks different, like it looks more correct but I am not sure if it is actually better
Question
Is any of these two methods better? Or is it just a matter of preference? Which one should I use?
Update
I accepted an answer now that gave sound reasoning on why to chose the second one. However if anyone reading this feels like there are other arguments in favor of any of the two visualisation / convergence methods, I would be very happy to hear them as answer/comment!
Best Answer
The second method. It shows the incremental improvement as you increase the sample size during a single simulation.
Take your routines above, delete the confidence interval shading, and put them in a loop, looping 10 times to get 10 lines. You should see the second one looks like those law of large number graphs you see, such as this one:
https://alphaarchitect.com/wp-content/uploads/2014/01/lawoflargenumbers.png
Whereas the first method won’t look like that.