Solved – Logarithmic sampling in Monte Carlo instead of linear

monte carlosamplingsimulation

The common case for a Monte Carlo simulation is, if we want to run our simulation for $N$ steps, we define a delta $\Delta,$ such that $N/\Delta = n$ tells us the frequency with which we measure/evaluate quantities of interest during the MC run.
For instance, say $N=1000,$ and we want to measure the average of an observable $f,$ after each $n=100$ steps (so here $\Delta=10$). The average is obtained by dividing $f$ by the number of times $f$ has been measured.
This is linearly sampling the time or MC steps and the simulation looks schematically like (writing in python style):

interval = N/delta
f = 0
countmeasurement = 0
for mcstep in range(1,N):
  interval -= 1
  .
  .
  .
  if interval == 0:
    f += computef()
    .
    .
    interval = N/delta
    countmeasurement += 1
  #save or print
  print "steps ", mcstep, "average f ", f/countmeasurement 

Question:

With this scheme as sketched above, if later we plot $f$ as a function of MC-steps, on a log-log scaled plot, we will not have sampled the same amount of datapoints for each time scale as we sampled the total steps only linearly. How do we alter our sampling scheme, thus redefining interval, such that we gather the same number of datapoints for $f$ for each part (time-scale) of the log-log plot? Is it common to sample the MC-steps logarithmically in Monte Carlo simulations?

Best Answer

If I've understood the question correctly -

What you have to do is construct the step values of your simulation to be evenly distributed on a log scale, then exponentiate the step values for use in the relevant function. The differences between the step values, when plotted on a log-log graph, will be equal.

For example, consider plotting the cumulative density function of a Pareto (power law) variate as a highly simplified version of this. We set the mean and scale equal to 5, and plot the CDF for values of $x = 1, \dots, 30$:

ub <- 30
x <- 1:ub
y <- ppareto(x, m=5, s=5)
plot(y~x, log="xy")

which gives us the following graph:

enter image description here

This shows the unequal scaling referred to in the OP, admittedly in a different context, and how we lose a lot of information about the shape of the curve at values of the CDF below about 0.7 (i.e., most of it.)

Altering the stepsizes of $x$ to be equal on the log scale fixes the problem:

log_ub <- log(30)
x <- exp(seq(from=0, to=log_ub, length.out=30))
y <- ppareto(x, m=5, s=5)
plot(y~x, log="xy")

The $x$ values cover the same range in the two cases - $(1,30)$ - but are distributed more informatively in the second case:

enter image description here

In certain contexts, the "step sizes" referred to above are the number of iterations some sub-task is performed at each master step of the simulation. The procedure in those cases would have to be adjusted so as to generate integer values for the number of iterations, but for any significant number of iterations, just rounding off would be sufficient.

For example, sampling from a Pareto distribution with different sample sizes ranging from 1 to 10,000:

log_ub <- log(10000)
y <- rep(0,30)
x <- round(exp(seq(from=0, to=log_ub, length.out=30)))
for (i in 1:30) y[i] <- mean(rpareto(x[i], 1, 5))
plot(y~x, log="xy", xlab="# observations", ylab="Sample mean")

yields the following plot:

enter image description here

Related Question