Monte Carlo Simulation – Understanding Simulation and Extrapolation for Rare Events

extreme valuemonte carlorare-events

I am reviewing some work and the proposed solution seems to me not to be reliable. But I fail to find any references or even consistently formulate why I think this approach does not work.

Assume you want to model the risk of two objects colliding in a given space and the probability of a collision is expected to be small, say somewhere in the order of $10^{-6}$. A collision event is defined as the event when the radii of the two objects overlap in space. To model this, the approach runs about $10^5$ simulations and stores the minimum distance between the two objects for each run (i.e., the distance of the closest point the two objects came to on another in each run). Once the simulations are done, a number of probability functions are fitted to the minimum distance and the best function is kept (based on AIC). So far, that approach seems reasonable (except maybe for the choice of AIC as a useful measure of the goodness of fit).
Now the proposed approach uses the fitted probability density function to extrapolate to the area where the distances of the two objects is so small that they collide. I tried to sketch the histogram in the figure below. Note that it’s not to scale and I haven’t even seen the data, so the actual distribution of doesn’t really matter. I am interested in knowing if this approach works in general, not in a specific case.

enter image description here

Intuitively, this approach does not make sense to me. It assumes that the probability distribution which fits the data also fits on the far ends of the tails (where we wouldn't expect many, if any, observations from the simulations). Also, it seems counterintuitive that you can have a reliable estimate of the probability of an event in the order of $10^{-6}$ when you run only $10^5$ simulation runs.

Has anyone a more stringent argument why this shouldn’t (or perhaps should) work and/or can point me to some references?

Best Answer

There are indeed approaches which are better suited to estimate such extreme outcomes. The key word is Extreme Value Theory. A standard reference is Modeling extremal events by Embrechts, Klüppelberg and Mikosch.

The problem with the approach is exactly as you describe: Why can one extrapolate into regions where one has not seen any data (yet)? Fitting "some" parametric model on the body and then hoping that this describes the tails may or may not work in practice. But extreme value theory provides reasons why extreme quantiles can be modelled with certain quite specific parametric distributions.

Just judging from what you wrote a "Peaks over threshold" approach may be a good starting point. In this method you fit an extreme value distribution called Generalised Pareto distribution (whose parametric form can be justified by mathematical arguments) to the "interesting" tail of your simulated data.

Have a look at the reference where all this is explained in detail. For a more applied/readable treatment have a look at the book Quantitative Risk Management Chapter 7.2 "Threshold Exceedances" and there specifically Section 7.2.4 The Hill Method

Related Question