Solved – Deriving confidence interval from standard error of the mean when the data are non-normal

confidence intervalnonparametricstandard error

I have a small sample (n = 8), and I have calculated the mean and standard error of the mean. I don't know the underlying distribution of these observations, and I cannot assume it to be normal.

I want to derive the 95% confidence interval of the mean, and I have seen that people use Student's t distribution together with stand error to work out the confidence interval. But it seems that the method requires that the observations themselves come from a normal distribution.

How should I work out 95% confidence interval in my case?

Best Answer

This is somewhat tricky. There are several approaches:

  1. Assume the distribution isn't 'too far' from the normal (in a particular sense), and that the t-interval will give close to the desired coverage. The t is at least reasonably robust to mild deviations from the assumptions, so if the population distribution isn't particularly skewed or especially heavy tailed, that should at least work reasonably well.

  2. assume the distribution is symmetric* and construct an interval for the pseudomedian (Hodges-Lehmann estimate, median of pairwise averages) via a Wilcoxon signed-rank-type procedure. If the t-distribution would have been right, on average you lose very little by doing this. This can be done in many packages.

    [With a symmetric distribution whose mean exists, the mean, pseudomedian, the ordinary median (and many other location-measures) coincide. An interval that contains one with a particular probability will also contain the others]

    *(or at least 'sufficiently' close to it)

    Here's an example of this done in R:

    y <- rlogis(8,50,1)  
    wilcox.test(y,conf.int=TRUE)  
    
    Wilcoxon signed rank test`   
    
    data:  y
    V = 36, p-value = 0.007813  
    alternative hypothesis: true location is not equal to 0  
    95 percent confidence interval:  
     47.49677 52.22811  
    sample estimates:  
    (pseudo)median   
          49.55069   
    

    So the interval given there is (47.50, 52.23):

    enter image description here

    The purple vertical line segment is the sample mean and the centre blue one is the sample pseudomedian. The outer blue segments mark the ends of the confidence interval. You see that in this example the interval includes the true population mean of 50.

  3. assume symmetry and construct a CI from the values for the mean that would not be rejected by a permutation test (this can be done from a single permutation test distribution and 8 observations is few enough to get the whole permutation distribution rather than sample it).

  4. use bootstrapping to construct a CI for the mean. The bootstrap is justified by an asymptotic argument (so it may not work very well for small samples), but you can make various distributional assumptions and check its coverage properties for plausible distributions via simulation. This paper (pdf is downloadable at that link) suggests that the bootstrap-t intervals often get better coverage properties than the usual t-intervals -- but may have poor coverage when samples are small and the distributions are skew.

  5. If you have some additional information that would help guide a choice of distribution, you can get somewhere with other distributional assumptions. For example, if you know that the distribution is skew and continuous, you might try using a Gamma or lognormal model (say) to construct a CI for the mean. Or if you have count data you might use a Poisson, binomial or negative binomial model to try to construct an interval.

Related Question