Monte Carlo Simulation – Using Exponential Distributions in Probability Analysis

monte carloprobabilitysimulationstochastic-processes

I'm trying to simulate a stochastic model of deterministic exponential population growth, where $dN/dt = rN$ where $N$ is population size and $r$ is rate ($t$ time). I'm assuming there's no carrying capacity. This page (http://cnr.lwlss.net/DiscreteStochasticLogistic/) suggests this algorithm for simulating growth on an interval $[0, t_{end}]$:

  1. start at $t = 0$ with initial population size
  2. draw next time for birth event, $\delta t \sim Exponential(rN(1 – N/K))$ ($K$ is carrying capacity)
  3. increment population size, $N = N + 1$
  4. set $t = t + \delta t$
  5. if $t > t_{end}$ then quit, otherwise go to step 2.

since I don't have a carrying capacity $K$, I assume it's infinite, so the next birth time is $\delta t \sim Exponential(rN)$. Is that correct?

when I run the simulation this way it doesn't at all give similar results to $N(t) = P_0 e^{rt}$ (where $P_0$ is initial population size). even averaging over many iterations it seems to give different growth curves.

below is my code and the result of the simulation. red curve is deterministic exponential growth, black curves are simulation using exponential distribution. they clearly don't match.

import numpy as np
import matplotlib.pylab as plt
def sim(rate, start, end, init):
    N = 200
    finalsizes = []
    results = []
    for n in range(N):
        size = init
        curr_t = 0
        times = [curr_t]
        sizes = [init]
        new_rate = rate
        while curr_t <= end:
            # simulate next birth time. the scale
            # parameter is inversely proportional to population
            # size
            new_rate = 1/float(new_rate * size)
            div_time = np.random.exponential(scale=new_rate)
            # advance time
            curr_t += div_time
            if curr_t > end:
                # if we exceed time interval, quit
                break
            times.append(curr_t)
            # increase population size
            size += 1
            sizes.append(size)
        finalsizes.append([times, sizes])
    return finalsizes

# run simulation and plot results
init = 20
start = 0
end = 20
rate = 1
finalsizes = sim(rate, start, end, init)
plt.figure()
allsizes = []
for f in finalsizes:
    allsizes.append(f[1][-1])
    plt.plot(f[0], f[1], color="k", alpha=0.5)
times = np.arange(0, end + 1)
plt.plot(times, init*np.power(2, rate * times), color="r")
plt.xlabel("time")
plt.ylabel("size of population")
print "mean final size: ", np.mean(allsizes)
plt.show()

enter image description here

response to whuber's excellent answer: i don't understand why I have to specify the population size in advance. my simulation is meant to ask: in a given amount of time, what will be the variability in population size assuming exponential growth? (and not, how long would it take on average to make a population of size $N$, which is what whuber's simulation seems to be doing).

also, i don't think it's correct that i "haven't yet done enough simulations to appreciate what they are telling you". i updated my simulation to plot 1000 runs. as you can see, with my time-based stop condition, the results consistently underestimate the deterministic exponential growth population size.

my reasoning was that if i simulate $N$ runs for a duration $t$, then as $N \rightarrow \infty$, the average population size i get from my simulation should be an unbiased estimate of the population size based on the deterministic exponential growth model for $t$, i.e. $2^{rt}$. for example, if i simulate 3 time steps (starting with a single individual), i would expect the population size to sometimes be greater than 8 in the simulation and sometimes less than 8, and the average to converge to 8. it seems to be like this should be true even if i start with a very small population, as long as i simulate enough runs. is this incorrect? what is wrong with this reasoning? the simulations don't support this although i expected it to be true. it seems like there has to be a flaw in my reasoning and/or simulation.

update 2: fixed simulation where the scale parameter to exponential distribution decreases with population size (inversely proportional to it) and initial population size is 10. it still grossly underestimates exponential growth.

Best Answer

The whole point of simulation is to show you that such variation is realistic. In fact, there appears to be nothing wrong with your results--except that you haven't yet done enough simulations to appreciate what they are telling you. In this case, the results are especially erratic because the starting population is so small.

Let's run your scenario 500 times up to a population of 300 (rather than a half dozen times to a population of 3):

Figure 1

It looks more stable when you start with a larger population:

Figure 2

Just for fun, here's a similar simulation for a population that grows from one individual to its carrying capacity:

Figure 3

I used R for these simulations and plots, because it does one very interesting thing. Since you know in advance that the population will progress in whole steps from the initial population to the final one, you can easily generate the sequence of population values in advance of the simulation. Thus, all that remains is to generate a set of exponentially distributed variates with rates determined by that sequence and accumulate them to simulate the birth times. R performs that with a single command (the line that creates simulation below). It takes about one second. Everything else is just parameter specification and plotting.

(I can get away with such a simple algorithm because I am running these simulations out to a given population target rather than out to a given time endpoint. Obviously the model is the same; all that differs is how I control the length of the simulation.)

rate <- 1
pop.0 <- 1
time.0 <- 0
k <-   0        # Carrying capacity (use 0 or negative when not applicable)
n.final <- 300  # Must not exceed the capacity!
#
# Pre-calculation: populations and the associated rates.
#
n <- pop.0:(n.final-1)
if (k <= 0) r <- rate * n else r <- rate * n * (1 - n/k)
#
# The simulation.
# Each iteration is stored as a column of the result.
#
simulation <- replicate(500, cumsum(c(time.0, rexp(length(n), r))))
#
# Plot the results:
# Set it up, show the overlaid growth curves, then plot a reference curve.
#
plot(range(simulation), c(pop.0, n.final), type="n", ylab="Population", xlab="Time")
apply(simulation, 2, function(x) lines(x, c(n, n.final), col="#00000020"))
if (k <= 0) {curve(pop.0 * exp((x - time.0)*rate), add=TRUE, col="Red", lwd=2)} else
    curve(k*(1 - 1/(1+(pop.0/(k-pop.0))*exp(rate*(x-time.0)))), add=TRUE, col="Red", lwd=2)