I am doing some numerical experiment that consists in sampling a lognormal distribution $X\sim\mathcal{LN}(\mu, \sigma)$, and trying to estimate the moments $\mathbb{E}[X^n]$ by two methods:
- Looking at the sample mean of the $X^n$
- Estimating $\mu$ and $\sigma^2$ by using the sample means for $\log(X), \log^2(X)$, and then using the fact that for a lognormal distribution, we have $\mathbb{E}[X^n]=\exp(n \mu + (n \sigma)^2/2)$.
The question is:
I find experimentally that the second method performs much better then the first one, when I keep the number of samples fixed, and increase $\mu, \sigma^2$ by some factor T. Is there some simple explanation for this fact?
I'm attaching a figure in which the x-axis is T, while the y axis are the values of $\mathbb{E}[X^2]$ comparing the true values of $\mathbb{E}[X^2] = \exp(2 \mu + 2 \sigma^2)$ (orange line), to the estimated values. method 1 – blue dots, method 2 – green dots. y-axis is in log scale
EDIT:
Below is a minimal Mathematica code to produce the results for one T, with the output:
ClearAll[n,numIterations,sigma,mu,totalTime,data,rmomentFromMuSigma,rmomentSample,rmomentSample]
(* Define variables *)
n=2; numIterations = 10^4; sigma = 0.5; mu=0.1; totalTime = 200;
(* Create log normal data*)
data=RandomVariate[LogNormalDistribution[mu*totalTime,sigma*Sqrt[totalTime]],numIterations];
(* the moment by theory:*)
rmomentTheory = Exp[(n*mu+(n*sigma)^2/2)*totalTime];
(*Calculate directly: *)
rmomentSample = Mean[data^n];
(*Calculate through estimated mu and sigma *)
muNumerical = Mean[Log[data]]; (*numerical \[Mu] (gaussian mean) *)
sigmaSqrNumerical = Mean[Log[data]^2]-(muNumerical)^2; (* numerical gaussian variance *)
rmomentFromMuSigma = Exp[ muNumerical*n + (n ^2sigmaSqrNumerical)/2];
(*output*)
Log@{rmomentTheory, rmomentSample,rmomentFromMuSigma}
Output:
(*Log of {analytic, sample mean of r^2, using mu and sigma} *)
{140., 91.8953, 137.519}
above, the second result is the sample mean of $r^2$, which is below the two other results
Best Answer
There is something puzzling in those results since
but they are due to the problem and not to the numerical computations: I repeated the experiment in R and got the following picture with the same colour code and the same sequence of $\mu_T$'s and $\sigma_T$'s, which represents each estimator divided by the true expectation:
Here is the corresponding R code:
Hence there is indeed a collapse of the second empirical moment as $\mu$ and $\sigma$ increase that I would attribute to the enormous increase in the variance of the said second empirical moment as $\mu$ and $\sigma$ increase.