The right way of calculating the expected value of a function by Monte Carlo simulation is to calculate the (sample) average of the function value on all n (one million in your write up) replications. You should also look at the sample standard deviation divided by sqrt(n) or sqrt(n-1), to understand the uncertainty in your estimate.
You need to correctly generate the values of X and Y, accounting for dependency, if any, between X and Y. I.e., draw from their joint distribution. if they are independent, this amounts to drawing separately from their own distributions. You have specified Uniform([0,10]) distributions for X and Y, and based on the results of the Monte Carlo simulation, you have assumed they are independent.
Assuming independence of X and Y, then indeed your averaging is the correct thing to do to get expected value of abs(X-Y). Your analytical calculation of 10 - sqrt(5) does indeed correspond to the median of abs(X-Y), not its expected value
Plot the sorted values of abs(X-Y) from the simulation, and you'll see that the median is not equal to the expected value. You may be too used to symmetric distributions, and perhaps in particular the Normal distribution. Median = expected value in such cases, but not for abs(X-Y) in your case.
To be clear, I will outline steps. First, examine a problem for which you know precisely all the parameter values.
You generate random estimates from an expression with the specified parameters and an error term. This can be accomplished by inverting (that is solving the mathematical function) for the error term. If you assume this is from a uniform distribution, then no more work. If exponentially distributed, for example, apply an appropriate transform (same log) to make it uniformly distributed.
Then, generate random deviates by inserting all values into your derived formula that specifies your function of interest.
Use all the data to test your method to uncover (reverse estimate) the true parameter values. As you know the right answer, construct an error estimate to assist in the best estimate path.
More advanced, use select data, here is some advice courtesy of Wikipedia, to quote:
While the naive Monte Carlo works for simple examples, an improvement over deterministic algorithms only can be accomplished with algorithms that use problem-specific sampling distributions. With an appropriate sample distribution it is possible to exploit the fact that almost all higher-dimensional integrands are very localized and only small subspace notably contributes to the integral.[6] A large part of the Monte Carlo literature is dedicated in developing strategies to improve the error estimates. In particular, stratified sampling—dividing the region in sub-domains—and importance sampling—sampling from non-uniform distributions—are two examples of such techniques.
Note: The usual naive Monte Carlo approach is to sample points uniformly over sample space given N uniform random starting deviates. This works, in general, due to the law of large numbers. However, it does not imply necessarily good efficiency especially for non-linear functions of interest.
So, to answer your question use "original MLE estimate or should I calculate for each sample firstly the MLE", this is answered since you know the right answer and can determine which works best over repeated simulations!
Best Answer
The approximation could be poor when $p$ is close to zero or one, but when $p = 1/2$ it holds exactly.
The idea here is that we want to estimate the probability of an event by using a sample proportion across many Monte Carlo trials, and we want to know how accurate of an estimate that proportion is of the true probability. The standard deviation $\sigma$ of a sample proportion is as the authors note $\sqrt{p (1 - p) / s}$ (where $s$ is the number of Monte Carlo simulations), but the problem is we don't know $p$. However, we can maximize $\sigma$ with respect to $p$ and get a conservative "estimate" of this standard error that will always hold no matter what $p$ happens to be. This may end up causing us to run more simulations than we need to, but this won't matter as long as the iterations themselves are computationally cheap.