[Math] Is it possible to calculate the mean and standard deviation from a median and quartiles

probability distributionsstandard deviationstatistical-inferencestatistics

Any advice would helpful.

I understand that the reporting of median and quartiles for small samples is an indication of skewed data. If such is correct, then is it useless to try to work out the mean and standard deviation given the data below?

Sample N=104;
median number per subject (25–75 quartiles): 1.4 (0.0–2.0)

I thought of using the following formula:
$\sigma = \sqrt{n} \frac{\text{upper limit} – \text{lower limit}}{\text{number of standard errors between upper and lower limits}}$

  1. Can I assume a normal distribution given that the data is based on quartiles?

  2. Can I assume that the 25th and 75th quartile are equivalent to the limits of a 50% confidence interval (CI)?

  3. Once I get the equivalent CI's, I could obtain the number of standard errors in a 50% CI based on a z-score for the normal distribution:
    $se = 0.674$ on a one tail and $1.348$ on a two tail

  4. So, replacing the values in the formula:
    $\sigma = \sqrt{104} \frac{0.0 – 2.0}{ 1.348} = -15.13$$

Is my work correct?

  1. How could I now obtain the mean?

Best Answer

It's mathematically impossible to deduce mean or standard deviation from median/quartiles, because medians and quartiles discard most of the data on which the mean and standard deviation are based.

Example:

data   frequency  
   0       50      
 1.4        4     
   2       50    

That has a mean of 1.0 and standard deviation of 0.9. (I'm using 2 significant figures so I don't have to go into population versus sample standard deviation.)

data     frequency    
   0       30        
 1.4       44        
   2       30        

That data also has the median and quartiles the same as in your example, but now the mean is 1.2 and the standard deviation is 0.8.

data     frequency        
   0       30        
 1.4        3        
   2       70        
10000000    1        

Now I've changed my maximum without changing the median or quartiles, you can see even more clearly how the median and quartiles exclude extreme data, because the mean is now 96000 and the standard deviation is 98000 (still 2 sig.fig.).

Related Question