Solved – Calculate median without access to raw data

descriptive statisticsmean

I'm working in a piece of software designed for satellite image classification based on various features of objects in the image. The software provides various built-in features like the mean of the values in the object, the maximum and minimum of the values etc. However, I want to use the median of the values.

I don't have access to the raw values in the object, all I have is the information below:

  • Mean
  • Max
  • Min
  • Standard Deviation

And I can do arithmetic on those values using standard operators (+, -, /, *, ^ etc).

Is there a way to calculate the median (or something closely approximating it) from just this information?

Best Answer

The question can be construed as requesting a nonparametric estimator of the median of a sample in the form f(min, mean, max, sd). In this circumstance, by contemplating extreme (two-point) distributions, we can trivially establish that

$$ 2\ \text{mean} - \text{max} \le \text{median} \le 2\ \text{mean} - \text{min}.$$

There might be an improvement available by considering the constraint imposed by the known SD. To make any more progress, additional assumptions are needed. Typically, some measure of skewness is essential. (In fact, skewness can be estimated from the deviation between the mean and the median relative to the sd, so one should be able to reverse the process.)

One could, in a pinch, use these four statistics to obtain a maximum-entropy solution and use its median for the estimator. Actually, the min and max probably won't be any good, but in a satellite image there are fixed upper and lower bounds (e.g., 0 and 255 for an eight-bit image); these would constrain the maximum-entropy solution nicely.

It's worth remarking that general-purpose image processing software is capable of producing far more information than this, so it could be worthwhile looking at other software solutions. Alternatively, often one can trick the software into supplying additional information. For example, if you could divide each apparent "object" into two pieces you would have statistics for the two halves. That would provide useful information for estimating a median.