Apparently the concept of unbiasedness has already been discussed a long time ago. I feel it is a topic worth of dicussion as mean-unbiasedness is a standard requirement for a good estimator but for small sample it does not mean as much as in large sample estimations.

I post these two references as an answer to my second question in the post.

Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585. JSTOR 2236236.

Lehmann, E. L. "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592. JSTOR 2236928

I am somewhat pessimistic about a such non-parametric method, at least without the introduction of some sort of constraints on the underlying distribution.

My reasoning for this is that there will always be **a** distribution that breaks the true coverage probability for any finite $n$ (although as $n \rightarrow \infty$, this distribution will become more and more pathological), *or* the confidence interval will have to be arbitrarily large.

To illustrate, you could imagine a distribution that looks like a normal up to some value $\alpha$, but after $\alpha$ becomes extremely right skewed. This can have unbounded influence on the distribution's mean and as you push $\alpha$ out as far as possible, this can have arbitrarily small probability of making it into your sample. So you can imagine that for any $n$, you could pick an $\alpha$ to be so large that all points in your sample have extremely high probability of looking like it comes from a normal distribution with mean = 0, sd = 1, but you can also have *any* true mean.

So if you're looking for proper *asymptotic* coverage, of course this can be achieved by the CLT. However, your question implies that you are (quite reasonably) interested in the finite coverage. As my example shows, there will always be a pathological case that ruins any finite length CI.

Now, you still could have a non-parametric CI that achieves good finite coverage by adding constraints to your distribution. For example, the log-concave constraint is a non-parametric constraint. However, it seems inadequate for your problem, as log-normal is not log-concave.

Perhaps to help illustrate how difficult your problem could be, I've done unpublished work on a different constraint: inverse convex (if you click on my profile, I have a link to a personal page that has a preprint). This constraint includes *most*, but not all log-normals. You can also see that for this constraint, the tails can be "arbitrarily heavy", i.e. for any inverse convex distribution up to some $\alpha$, you can have heavy enough tails that the mean will be as large as you like.

## Best Answer

This is a misleading oversimplification common to many introductory texts. It depends on what you want to find out (/what typical means for your particular needs). It can be perfectly reasonable to be interested in the behaviour of the mean with a skewed distribution, and where the mean isn't quite what you need, it may easily be that the median is less useful still!

As an example, consider how long I wait for the lights at a particular intersection while travelling to work. This waiting time is quite right skew. I am interested in the average wait (for example, in working out my average travel time or how much time I waste at that intersection in a year); on the other hand if I am worried about being late to work, neither the mean nor the median are much direct use; I'm much more interested in the chance that the wait might exceed say 10 minutes (in heavy traffic it does sometimes happen)

If you're interested in an interval for the population median for a continuous variate, one can be produced without even assuming a distributional form (a nonparametric/distribution-free estimate), based on sample quantiles (or more precisely, on order statistics).

This calculation is discussed in various places, including other questions on site. Also see Wikipedia or here for example. These small-sample intervals are "exact" in the sense that they have the relevant coverage (from the binomial), but those possible values for the coverage ('confidence level') are discrete (i.e. you can't form an exact 95% interval but for example at n=30, you can choose symmetric intervals that have coverages 99.5%, 98.4%, 95.7% and 90.1%).

This sort of interval is related to sign tests and proportions tests.

As an explicit example, let's say we have n=15 values randomly drawn from a continuously distributed population; one of the available levels is 96.5%, corresponding to the interval from the 4th to the 12th sorted observations. That is to say, under repeated sampling, about 96.5% of intervals from the 4th largest to the 12th largest observation would include the population median.

If you're interested in a standard error of the sample median, there's an asymptotic approximation that relies on knowing the height of the density at the median (in large samples from a continuous distribution the sample median will be asymptotically normal, so this can potentially be used to give large-sample intervals).

$\text{s.e.}(\tilde{x})\approx\frac{1}{2\sqrt{n}\,f(\tilde{\mu})}$

If your sample is also large enough to form a reliable estimate of scale (so that you can in turn get that estimate of the density at the median if you know $f$ up to location and scale) this may be useful.

What's "best" for your purposes? What, if anything is known about the distribution?