Solved – Estimating parameters of a normal distribution: median instead of mean

estimationnormal distributionoutliersrobustunbiased-estimator

The common approach for estimating the parameters of a normal distribution is to use the mean and the sample standard deviation / variance.

However, if there are some outliers, the median and the median deviation from the median should be much more robust, right?

On some data sets I tried, the normal distribution estimated by $\mathcal{N}(\text{median}(x), \text{median}|x – \text{median}(x)|)$ seems to produce a much better fit than the classic $\mathcal{N}(\hat\mu, \hat\sigma)$ using mean and RMS deviation.

Is there any reason to not use the median if you assume there are some outliers in the data set? Do you know some reference for this approach? A quick search on Google didn't find me useful results that discuss the benefits of using medians here (but obviously, "normal distribution parameter estimation median" is not a very specific set of search terms).

The median deviation, is it biased? Should I multiply it with $\frac{n-1}{n}$ to reduce bias?

Do you know similar robust parameter estimation approaches for other distributions such as Gamma distribution or the Exponentially modified Gaussian distribution (which needs Skewness in parameter estimation, and outliers really mess up this value)?

Best Answer

The observation that in an example involving data drawn from a contaminated Gaussian distribution, you'd get better estimates of the parameters describing the bulk of the data by using the $\text{mad}$ instead of $\text{med}|x-\text{med}(x)|$ where $\text{mad}(x)$ is:

$$\text{mad}=1.4826\times\text{med}|x-\text{med}(x)|$$

--where, $(\Phi^{-1}(0.75))^{-1}=1.4826$ is a consistency factor designed to ensure that $$\text{E}(\text{mad}(x)^2)=\text{Var}(x)$$ when $x$ is uncontaminated-- was originally made by Gauss (Walker, H. (1931)).

I cannot think of any reason not to use the $\text{med}$ instead of the sample mean in this case. The lower efficiency (at the Gaussian!) of the $\text{mad}$ can be a reason not to use the $\text{mad}$ in your example. However, there exist equally robust and highly-efficient alternatives to the $\text{mad}$. One of them is the $Q_n$. This estimator has many other advantages beside. It is also very insensitive to outliers (in fact nearly as insensitive as the mad). Contrary to the mad, it is not built around an estimate of location and does not assume that the distribution of the uncontaminated part of the data is symmetric. Like the mad, It is based on order statistics, so that it is always well defined even when the underlying distribution of your sample has no moments. Like the mad, It has a simple explicit form. Even more than for the mad, I see no reasons to use the sample standard deviation instead of the $Q_n$ in the example you describe (see Rousseeuw and Croux 1993 for more info about the $Q_n$).

As for your last question, about the specific case where $x\sim\Gamma(\nu,\lambda)$, then

$$\text{med}(x)\approx\lambda(\nu-1/3)$$

and

$$\text{mad}(x)\approx\lambda\sqrt{\nu}$$

(in both cases the approximations become good when $\nu>1.5$) so that

$$\hat{\nu}=\left(\frac{\text{med}(x)}{\text{mad}(x)}\right)^2$$

and

$$\hat{\lambda}=\frac{\text{mad}(x)^2}{\text{med}(x)}$$

See Chen and Rubin (1986) for a complete derivation.

  • J. Chen and H. Rubin, 1986. Bounds for the difference between median and mean of Gamma and Poisson distributions, Statist. Probab. Lett., 4 , 281–283.
  • P. J. Rousseeuw and C. Croux, 1993. Alternatives to the Median Absolute Deviation Journal of the American Statistical Association , Vol. 88, No. 424, pp. 1273-1283
  • Walker, H. (1931). Studies in the History of the Statistical Method. Baltimore, MD: Williams & Wilkins Co. pp. 24–25.