Solved – the best way to report the results and uncertainty from a Monte Carlo simulation

error-propagationmonte carlouncertainty

I am fitting data to a model that has ~30 input parameters, each with their own uncertainty levels, and which can interact with each other in the model. I therefore decided the best way to fit the data (2 parameter fit) and bound the fit uncertainty is to run a Monte Carlo simulation. I now have a collection of ~5k fit results and it is nicely converged. For each one I randomly perturbed each input parameter according to its level of uncertainty. My question is: what is the most "correct" way to report from this what are the true fitted values and associated uncertainties? Thoughts:

  1. Report the mean $\pm$ standard error of the mean (i.e. $avg(y_{i}) \pm \frac{stdev(y_{i})}{\sqrt{N}}$, where $y_{i}$ are my fit results, and i runs from 0 to N = ~5k).
  2. Report the median along with the 90th and 10th percentile values. Note the distribution for one of the fitted parameters is a bit skewed (makes me less confident in approach #1), so the mean does not equal the median, and the 90th and 10th percentiles are asymmetric bounds on the median.
  3. I noticed that if I plot all 5k fit results as a function of their corresponding Durbin-Watson coefficients there is a weak but clear trend. My instinct tells me I should therefore report the value corresponding to thigh highest DW (or trending that way), as this likely represents the fitted values for when my model input parameters were (randomly) most accurate (i.e. best captured the real physics). Is this ever done? If so, how would I bound my uncertainty?
  4. Other ideas?…

Thanks!

Best Answer

I think what you're after is the most likely value for each parameter. For integer values, that would be the mode of all simulated values, for which the mean and median are approximations. Of course, for continuous variables you can't really calculate a mean, in which case you can use histograms to estimate the most likely value. Either use the densest histogram bin, or fit the histogram with some skewed distribution. Or, if the distributions aren't too skewed, you can use the mean or median. You can take the uncertainties as the 90th - 10th percentile values around these most likely values.

Something to remember is that these values and their uncertainties don't contain any cross- or auto- correlation information, so you might need to think whether you need to report on that as well.

Related Question