Solved – When to use sample median as an estimator for the median of a lognormal distribution

lognormal distributionmedianunbiased-estimator

I myself would always use geometric mean to estimate a lognormal median. However, in the industry world, sometimes using the sample median gives better results. The question thus is, is there a cutoff range/point starting from which the sample median can be used reliably as an estimator for the population median?

Also, the sample geometric mean is MLE for median, but not unbiased. An unbiased estimator would be $\hat{\beta}_{\mbox{CGM0}}=\exp(\hat{\mu}-\sigma^2/2N)$ if $\sigma$ is known. In practice, a biased corrected estimator $\hat{\beta}_{\mbox{CGM}}$ (see below) is used since $\sigma$ is always unknown. There are papers saying that this bias-corrected geomean estimator is better because of smaller MSE and unbiasedness. However, in reality, when we only have a sample size of 4 to 6, can I argue that the bias correction makes no sense since

  1. Unbiasedness means the estimator is centered around the true population parameter, neither under nor over-estimate the parameter. For positively skewed distribution, the center is the median not the mean.
  2. Invariant to transformation is important property in my current area(transformation between DT50 and degradation rate k, k=log(2)/DT50). You will get different results based on original data and on the transformed data.
  3. For limited sample size, mean unbiasedness is potentially misleading. Bias is not error, an unbiased estimator can give bigger error. From a Bayesian point of view, the data is known and fixed, the MLE maximizes the probability of observing the data, while the bias correction is based on fixed parameters.

The sample geometric mean estimator is MLE, median-unbiased, invariant to transformations. I think it should be preferred to the bias-corrected geomean estimator. Am I right?

Assumming $X_1,X_2,…,X_N \sim \mbox{LN}(\mu,\sigma^2)$

$\beta = \exp(\mu)$

$\hat{\beta}_{\mbox{GM}}= \exp(\hat{\mu})= \exp{(\sum\frac{\log(X_i)}{N})} \sim \mbox{LN}(\mu,\sigma^2/N)$

$\hat{\beta}_{\mbox{SM}}= \mbox{median}(X_1,X_2,…,X_N) $

$\hat{\beta}_{\mbox{CGM}}= \exp(\hat{\mu}-\hat\sigma^2/2N)$

where, $\mu$ and $\sigma$ are the log-mean and log-sd, $\hat\mu$ and $\hat\sigma$ are the MLEs for $\mu$ and $\sigma$.

A related question: for the variance of the sample median, there is an approximate formula $\frac{1}{4Nf(m)^2}$; what is a big enough sample size to use this formula?

Best Answer

Apparently the concept of unbiasedness has already been discussed a long time ago. I feel it is a topic worth of dicussion as mean-unbiasedness is a standard requirement for a good estimator but for small sample it does not mean as much as in large sample estimations.

I post these two references as an answer to my second question in the post.

Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585. JSTOR 2236236.

Lehmann, E. L. "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592. JSTOR 2236928

Related Question