Solved – Non-parametric confidence interval about the difference of means for unpaired data

bootstrapnonparametric

Consider two samples ${(x_i)}_{i=1}^m$ and ${(y_i)}_{i=1}^n$. Assume that the $x_i$ are independent replicates from a distribution with expectation $\mu_X$ and similarly the $y_i$ are independent replicates from a distribution with expectation $\mu_Y$, and also assume that the two samples are independent. Is it possible to get a boostrap confidence interval about the difference of means $\mu_X-\mu_Y$ ? Or is there another nonparametric way to get such a confidence interval ?

EDIT : Oops – I've just seen this topic Confidence interval for the difference of two means using boot package in R Nevertheless I am interested in understanding why the method is correct. This is not a classical bootstrap procedure, isn't it ? Here we sample separately in each data sample. This is not like the classical bootstrap which is the case when there is only one data sample.

Best Answer

Yes. It is like a stratified bootstrap. You sample with replacement m times for population 1 and calculate a bootstrap mean for it. Do the same by sampling with replacement n times from population 2 calculate a bootstrap mean for it and then take the difference of the bootstrap means. Repeat this many time to get an approximate bootstrap distribution for the mean difference. As long as the conditions for the bootstrap mean to work are satisfied for each population the bootstrap will work on the mean difference. In my book Bootstrap Methods 2nd Edition pp. 67-71, I show an example where I did this in a clinical trial looking at the mean difference in capture thresholds for two pacing leads.

Related Solutions

Solved – Confidence interval for the difference of two means using boot package in R

If you look at your totalBoot$t you will see that all the returned values are identical. The secret is that you have not defined your statistic function (meanDiff) to actual resample the data. The help page for boot says

When sim = "parametric", the first argument to statistic must be the data. ... In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample.

If you redefine your meanDiff as

meanDiff = function(dataFrame, indexVector) { 
    m1 = mean(subset(dataFrame[indexVector, 1], dataFrame[indexVector, 2] == "initial"))
    m2 = mean(subset(dataFrame[indexVector, 1], dataFrame[indexVector, 2] == "final"))
    m = m1 - m2
    return(m)
}

It should work. Or (not that it matters) I prefer:

meanDiff =function(x, w){
    y <- tapply(x[w,1], x[w,2], mean)
    y[1]-y[2]}

Solved – Bootstrapped confidence interval for the difference in means for paired data

The first method is no resampling test of which I'm aware in the literature. It seems like your goal, by resampling $X$ and $Y$ independently, is to generate data under the null hypothesis. This approach is inefficient because you are ignoring pairing in the design.

The preferred resampling method for generating data under the null hypothesis is the permutation test. Permutation testing for paired data is done by randomly negating the $X-Y$ differences; i.e. replacing them with $Y-X$. Here, the between-pair differences are preserved, but the within-pair differences are only preserved if the paired mean difference is 0.

The second example is a proper description of a paired bootstrap.

Best Answer

Related Solutions

Solved – Confidence interval for the difference of two means using boot package in R

Solved – Bootstrapped confidence interval for the difference in means for paired data

Related Question