[GIS] Simulation envelopes and significance levels

rsimulationspatial statistics

Many spatial analysis packages provide Monte Carlo techniques to simulate upper and lower “envelopes” for a summary statistic as a function of distance (e.g. K-function, nearest neighbor, etc…). I’ve sometimes seen the significance of deviation from the envelope expressed as:

p = m * 1 / (n +1)

Where n is the number of simulations and m is the rank of the largest and smallest observation from a simulation sample for each distance r. For example, if m = 1, the first largest and first smallest values from the simulation sample are used to plot the upper and lower simulation envelopes. If m = 2, the second largest and second smallest values are used to plot the simulation envelopes.

My questions are:

1/ where does the + 1 in the denominator originate from? For instance, if we run 39 simulations, why divide by 40?

2/ Regardless of the number of simulations performed (i.e. 39 or 9999), m seems to have a disproportionate influence on the computation of the significance level. It would seem that taking the 2nd highest and lowest values of a simulation sample from 9999 simulations would have less of an impact on the width of the simulation envelope than from 39 simulations. I’m sure that there is a sound theoretical basis for this, but its logic escapes me. Is there an analogy in inferential (non-spatial) statistics that can help make this a bit more intuitive?

3/ How should one present the results of a simulation envelope? I sometimes see a p value defined for a simulation, but how is one to know if the presented p is for m=1 or a more stringent m=3?

Edit: Per whuber's comment here is some clarification:

1/ The type of simulation pertains to testing the null hypothesis such as complete spatial randomness (CSR)

2/ Example of a package that computes simulation envelopes is spatstat (R). The function is called envelope and nrank is the parameter m described in the above equation.

Best Answer

  1. The +1 is a convention. It's all about converting the ranks to percentiles. Consider 99 iterations. The rank will run from 1 through 99 (in whole steps). You can convert the rank to a percentile by dividing by 99 and multiplying by 100. That would produce percentiles from 100/99 = 1.01% to 99*100/99 = 100%. That lacks a desirable symmetry: you're saying the lowest value is 1.01% from the bottom but the highest value is right smack at the top of the range. To restore that symmetry, you should shift the ranks down by 0.5: the lowest value gets assigned a percentile of (1-0.5)*100/99 = 0.50% and the highest value gets a percentile of (99-0.5)*100/99 = 99.50% = 100 - 0.50%. Everything becomes nice and symmetric. You can avoid adjusting the ranks simply by adding 1 to the denominator. Now the percentiles in the example range from 1*100/100 = 1% up to 99% in even, symmetrical, 1% steps. (I illustrate this, and discuss it more generally, at http://www.quantdec.com/envstats/notes/class_02/characterizing_distributions.htm : look towards the bottom of that page.)

  2. The purpose of the simulation is to compute what is likely to happen by chance alone (the "null distribution"). For instance, we can ask whether Joe Paterno's football bowl record at Penn State of 24 wins in 37 attempts is just the result of blind luck, no better (or worse) than if each game had been decided by a fair coin flip. To do so, we might actually flip a coin 37 times, count the heads (to represent wins), and repeat this for a total of 99 iterations. We would find that very few of those 99 attempts achieved 24 or more heads--most likely 5 or fewer. That's pretty good evidence that over time his teams were better than the bowl competition, indicating they tended to be under-rated. (99 is too small a number to use here, really: I actually used 100,000, in which I observed 4972 occasions of 24 or more heads.)

    The one aspect of this example that you control is the weight of evidence: is a 5/100 chance enough to convince you the result was not due to randomness (or luck)? Depending on the circumstance, some people need stronger evidence, others weaker. That's the role of m. When you draw the envelopes at the most extreme out of (say) 99 iterations, then you are estimating the lowest 1% and the highest 1%. You would guess that random fluctuations would place the curve between these envelopes 100 - 1 - 1 = 98% of the time. This corresponds (very roughly, because 99 iterations is still too little) to a "two-sided p-value of 0.02." If you don't need such strong evidence, you might choose (say) m = 3. Now the lower envelope represents the bottom 3/99 = 3.03% of the chance distribution and the upper envelope represents the top 3.03% of that distribution. Your two-sided p-value is around 6%. (Because 99 is so small, the true p-value could be as large as 15% or so, so beware! Do many more Monte-Carlo iterations if you possibly can.)

  3. In some sense you're trying to depict a distribution of envelopes (curves). That's a complicated thing. One way is to choose some percentiles in advance, such as 1% (and therefore, by symmetry, 99%), 5% (and 95%), 10% (and 90%), 25% (and 75%). Out of 99 simulated curves these would correspond roughly to the lowest (and highest), the fifth lowest (and fifth highest), etc. (Comparing curves that can intersect each other in this way is problematic, though: when they cross, which is more extreme than the other?) Plotting those selected curves at least gives one a visual idea of the spread that is produced by chance mechanisms alone.

I hope you get the sense that this approach of generating a small bunch of curves (under the null hypothesis) and drawing a selected few is exploratory, approximate, and somewhat crude, because it is all of those. But it's far better than just guessing whether or not your data could have arisen by chance.

Related Question