Solved – get the parameters of a lognormal distribution from the sample mean & median

estimationlognormal distributionmeanmedianparameterization

I have the mean and median values for a sample drawn from a lognormal distribution. Note that this is not the mean and median of the logs of the variable, though I can of course calculate the logs of the mean and median. Is there a closed form solution for μ and σ from this information? If there is only a numeric solution, could you tell me how to find it, ideally with R?

I note that this question has been answered for deriving μ and σ from the sample mean and the sample variance, here:
How do I estimate the parameters of a log-normal distribution from the sample mean and sample variance
However, I do not have the sample variance, only the mean and median.

If there is no closed-form or straightforward numeric solution, I'd like to know if using the logs of the sample mean and median, or some transform of them, will provide a reasonable answer for a large sample (in the hundreds of millions).

Best Answer

It rather depends on what you mean by "get". In general you can't obtain population quantities from sample information. However, you can often obtain estimates, though in this case the estimates may not be very good.

If you have them, you can readily calculate the parameters from the population mean and median; if $\tilde{m}=\exp(\mu)$ is the population median and $m=\exp(\mu+\frac12\sigma^2)$ is the population mean then $\mu=\log(\tilde{m})$ and $\sigma^2=2\log(\frac{m}{\tilde{m}})=2(\log(m)-\log(\tilde{m}))$.

You could similarly attempt to use the sample mean and sample median in some kind of estimator of the population quantities.

If the only things you have are the sample mean and median from a lognormal ($\bar{x}$ and $\tilde{x}$ respectively) then you could at least use the obvious strategy of replacing population quantities by sample ones*, combining method of moments and method of quantiles ... $\hat{\mu}=\log(\tilde{x})$ and $\hat{\sigma}^2=2\log(\frac{\bar{x}}{\tilde{x}})=2(\log(\bar{x})-\log(\tilde{x}))$.

I believe these estimators will be consistent. However, in small samples these are sure to be biased, and may not be very efficient, but you may not have a lot of choice without considerable analysis.

Of course, in reality, you don't really know your data are drawn from a lognormal distribution - that's pretty much a guess. However, in practice it might be a quite serviceable assumption.

Ideally one would work out the joint distribution of the sample mean and median from a lognormal, and then try to maximize the likelihood over the parameters on that bivariate distribution; that should do about as well as possible, but that's more a decent research problem (well worth a paper if it hasn't been done before) than a matter of a few paragraphs of answer.

One could conduct some simulation investigations into the properties of the joint distribution of sample mean and median. For example, consider that the distribution of the ratio of mean to median should be scale-free -- a function of $\sigma$ only. Even if we can't compute it algebraically, we can look at how the ratio (for example) behaves as $\sigma$ changes. One might then be able to choose the $\sigma$ that approximately maximizes the chance of getting the ratio you observed ($\mu$ could be estimated in a variety of ways, but the obvious one - the log of the median, as mentioned earlier - would not be terrible).


* Warning: it's perfectly possible for the sample median to exceed the sample mean. In that case the simple estimator suggested above is no help, since it relies on the mean being above the median (it will give a negative estimate for a positive parameter).