Why are you looking at non-parametric tests? Are the assumptions of the t-test violated? Namely, ordinal or non-normal data and inconstant variances? Of course, if your sample is large enough you can justify the parametric t-test with its greater power despite the lack of normality in the sample. Likewise if your concern is unequal variances, there are corrections to the parametric test that yield accurate p-values (the Welch correction).
Otherwise, comparing your results to the t-test is not a good way to go about this, because the t-test results are biased when the assumptions are not met. The Mann-Whitney U is an appropriate non-parametric alternative, if that's what you really need. You only lose power if you are using the non-parametric test when you could justifiably use the t-test (because the assumptions are met).
And, just for some more background, go here: Student's t Test for Independent Samples.
The observation that in an example involving data drawn from a contaminated Gaussian distribution, you'd get better estimates of the parameters describing the bulk of the data by using the $\text{mad}$ instead of $\text{med}|x-\text{med}(x)|$ where $\text{mad}(x)$ is:
$$\text{mad}=1.4826\times\text{med}|x-\text{med}(x)|$$
--where, $(\Phi^{-1}(0.75))^{-1}=1.4826$ is a consistency factor designed to ensure that $$\text{E}(\text{mad}(x)^2)=\text{Var}(x)$$
when $x$ is uncontaminated-- was originally made by Gauss (Walker, H. (1931)).
I cannot think of any reason not to use the $\text{med}$ instead of the sample mean in this case. The lower efficiency (at the Gaussian!) of the $\text{mad}$ can be a reason not to use the $\text{mad}$ in your example. However, there exist equally robust and highly-efficient alternatives to the $\text{mad}$. One of them is the $Q_n$. This estimator has many other advantages beside. It is also very insensitive to outliers (in fact nearly as insensitive as the mad). Contrary to the mad, it is not built around an estimate of location and does not assume that the distribution of the uncontaminated part of the data is symmetric. Like the mad, It is based on order statistics, so that it is always well defined even when the underlying distribution of your sample has no moments. Like the mad, It has a simple explicit form. Even more than for the mad, I see no reasons to use the sample standard deviation instead of the $Q_n$ in the example you describe (see Rousseeuw and Croux 1993 for more info about the $Q_n$).
As for your last question, about the specific case where $x\sim\Gamma(\nu,\lambda)$, then
$$\text{med}(x)\approx\lambda(\nu-1/3)$$
and
$$\text{mad}(x)\approx\lambda\sqrt{\nu}$$
(in both cases the approximations become good when $\nu>1.5$) so that
$$\hat{\nu}=\left(\frac{\text{med}(x)}{\text{mad}(x)}\right)^2$$
and
$$\hat{\lambda}=\frac{\text{mad}(x)^2}{\text{med}(x)}$$
See Chen and Rubin (1986) for a complete derivation.
- J. Chen and H. Rubin, 1986.
Bounds for the difference between median and
mean of Gamma and Poisson distributions, Statist. Probab. Lett., 4
, 281–283.
- P. J. Rousseeuw and C. Croux, 1993.
Alternatives to the Median Absolute Deviation
Journal of the American Statistical Association , Vol. 88, No. 424, pp. 1273-1283
- Walker, H. (1931). Studies in the History of the Statistical Method. Baltimore, MD: Williams & Wilkins Co. pp. 24–25.
Best Answer
If it seems like most of the outliers are to the far right you could decide on a threshold including most of the datapoints to the left and censor all values to the right of that threshold. It would be akin to trimming but without introducing a bias. I don't know how you would run such an analysis in a classical statistics framework but it is "pretty easy" using Bayesian statistics