Solved – Confidence interval for the difference of two means using boot package in R

bootstrapconfidence intervalr

I have two samples, one of size 52, and one of size 31, that are obtained at different times. I'd like to get a 95% bootstrap confidence interval for the difference between the means of the populations that these samples represent. I've been trying to use the "boot" package in R, and I'm getting an error that I can't figure out. I was hoping someone here could help me out.

This is how my data look like (in a dataframe named "totalData"):

      X              samplingTime
1  -0.29              initial
2   0.3               initial 
           ....
52  -1.2              initial
53   0.7              final
54  -1.2              final
           ....
83   1.52             final 

This is what I did to get my bootstrap CI:

meanDiff = function(dataFrame, indexVector) { 
    m1 = mean(subset(dataFrame[, 1], dataFrame$samplingTime == "initial"))
    m2 = mean(subset(dataFrame[, 1], dataFrame$samplingTime == "final"))
    m = m1 - m2
    return(m)
}

totalBoot = boot(totalData, meanDiff, R = 10000, strata = totalData[,2])
totalBootCI = boot.ci(totalBoot)

and in the last line I get the error:

Error in bca.ci(boot.out, conf, index[1L], L = L, t = t.o, t0 = t.o, : estimated 
adjustment 'w' is infinite. 

I'd very much appreciate any comments.

Thanks!

Best Answer

If you look at your totalBoot$t you will see that all the returned values are identical. The secret is that you have not defined your statistic function (meanDiff) to actual resample the data. The help page for boot says

When sim = "parametric", the first argument to statistic must be the data. ... In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample.

If you redefine your meanDiff as

meanDiff = function(dataFrame, indexVector) { 
    m1 = mean(subset(dataFrame[indexVector, 1], dataFrame[indexVector, 2] == "initial"))
    m2 = mean(subset(dataFrame[indexVector, 1], dataFrame[indexVector, 2] == "final"))
    m = m1 - m2
    return(m)
}

It should work. Or (not that it matters) I prefer:

meanDiff =function(x, w){
    y <- tapply(x[w,1], x[w,2], mean)
    y[1]-y[2]}
Related Question