Solved – Monte Carlo simulation of low probability events with impact

binomial distributionexcelmonte carlorare-events

I wonder if I'm lacking the terminology to phrase this question: I would like to model, in Excel, the risk of a fairly rare event, E, happening over time, where time is divided into chunks called periods, each with n trials. So the event has a low probability (p < 0.1%) of occurring but a high variable impact, hence the client's interest.

If the event impact were constant I would use the inverse binomial function to model the occurance of events in each period as BINOM.INV(n, p, RAND()). A possible approach is then to layer the impact distribution (assume Normal) on top of this: NORM.INV( RAND(), $\bar{x}$, $s$ ). By repeating this many times I can build up a monte-carlo simulation model (into which I need to feed other parameters).

It feels contrived and unrobust, probably inevitably: I don't have much data on how often E occurs, or its impact impact, so clearly I can't be confident about when these events will occur in the future.

Which doesn't stop people asking.

Here's some data: In a sample of 7000 "trials", the event happened 5 times with impacts 6.3, 6.3, 6.7, 6.8 and 7.6. There are 100 trials per period.

My specific questions, which may not be the right ones, are:

  1. Given the need to model this event, how appropriate is this combined simulation model?
  2. What better ways are there to model these kinds of events – that have both a probability of occurring and an variable impact?

Thanks in advance for your patience and ideas.

Best Answer

To expand on my comment - this is a very common insurance problem, combining modelling of the rare event often with something like a gamma distribution of the impact (in insurance impacts are rarely normally distributed). Search terms like "compound distribution insurance" might get you something useful.

The distribution you choose for the size-of-impact random variable will be crucial to your results. Even on only five points of data you at least have a snippet of evidence that the variance of the impacts is not huge (I'm assuming that was real data) and no evidence against your normality assumption. However it is plausibly from any number of distributions including a difficult-to-model mixed distribution (eg 90% of results from N(7,.5) and 10% from N(20,3)). So the most worrying part of your approach would be the assumption of normality there.

As @TooTone says, if the number of events isn't bounded you may want to consider an alternative to the binomial distribution for the first random variable. In insurance, binomial distributions can be used in circumstances when the number of events really is bounded eg life insurance; whereas other unbounded methods are more common where the number of events may be effectively infinite (eg car insurance). For your purposes I doubt this will have anywhere near as much impact as your choice of parameters and of the distribution of the size-of-impact variable.

Overall, your approach seems fine and probably better than laboriously working out theoretical characteristics of the compound distribution, but to make it more plausible you should make your parameters random variables too. That is, the proportion of possibilities that result in an event; and the moments of the distribution of the size of each event. This would be much more efficient in a tool other than Excel, but there is no reason you couldn't do it in Excel if you really want to.

Related Question