Solved – How to calculate the confidence interval of a mean in a non-normally distributed sample

bootstrapconfidence intervaldescriptive statisticsnonparametricskewness

How can I calculate the confidence interval of a mean in a non-normally distributed sample?

I understand bootstrap methods are commonly used here, but I am open to other options. While I am looking for a non-parametric option, if someone can convince me that a parametric solution is valid that would be fine. The sample size is > 400.

If anyone could give a sample in R it would be much appreciated.

Best Answer

First of all, I would check whether the mean is an appropriate index for the task at hand. If you are looking for "a typical/ or central value" of a skewed distribution, the mean might point you to a rather non-representative value. Consider the log-normal distribution:

x <- rlnorm(1000)
plot(density(x), xlim=c(0, 10))
abline(v=mean(x), col="red")
abline(v=mean(x, tr=.20), col="darkgreen")
abline(v=median(x), col="blue")

Mean (red), 20% trimmed mean (green), and median (blue) for the log-normal distribution

The mean (red line) is rather far away from the bulk of the data. 20% trimmed mean (green) and median (blue) are closer to the "typical" value.

The results depend on the type of your "non-normal" distribution (a histogram of your actual data would be helpful). If it is not skewed, but has heavy tails, your CIs will be very wide.

In any case, I think that bootstrapping indeed is a good approach, as it also can give you asymmetrical CIs. The R package simpleboot is a good start:

library(simpleboot)
# 20% trimmed mean bootstrap
b1 <- one.boot(x, mean, R=2000, tr=.2)
boot.ci(b1, type=c("perc", "bca"))

... gives you following result:

# The bootstrap trimmed mean:
> b1$t0
[1] 1.144648

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates
Intervals : 
Level     Percentile            BCa          
95%   ( 1.062,  1.228 )   ( 1.065,  1.229 )  
Calculations and Intervals on Original Scale