Confidence Interval – Understanding Disagreement Between p-values and Confidence Intervals

confidence intervalp-valuespsst-test

This is a question regarding the t-test in SPSS.

I have two groups and I want to test if the two means are equal. I am using the t-test with bootstrapping. In the end I got a p-value<0.005, which would generally cause me to reject the null hypothesis that the means of the two populations are equal but in my case the zero lies within the 95% BCa bootstrap confidence intervals based on 1000 samples.

Do I still reject the hypothesis of equal means?

Best Answer

Caveat: This answer assumes that the question is about interpreting bootstrapped p-values and CIs. A comparison between a traditional p-value (not bootstrapped) and a bootstrapped CI would be a different issue.

With a traditional (not bootstrapped) t-test, the 95%CI and the p-value's position relative to the .05 cutoff for significance will always tell you the same thing. That's because they're both based on the same information: the t-distribution for your degrees of freedom and the mean and standard error observed in your sample (or difference between means and standard error, in the case of a two-sample t-test). If your CI doesn't overlap with 0, then your p-value will necessarily be < .05 --- unless, of course, there's a bug in the software or a user error in implementation or interpretation of the test.

With a bootstrapped t-test, the CI and p value are both calculated directly from the empirical distribution generated by the bootstrapping: the p value is simply what percent of bootstrapped group differences are more extreme than the original observed difference; the 95%CI is the middle 95% of bootstrapped group differences. It is not impossible for the p-value and the CI to disagree about significance in a bootstrapped test.

Do you accept or reject the null hypothesis?

In the context of a bootstrapped test, the p-value (as compared to the CI) more directly reflects the spirit of the hypothesis test, so it makes the most sense to rely on that value to decide whether or not to reject the null at your desired alpha (generally .05). So in your case, where the p value is less than .05 but the 95%CI contains zero, I recommend rejecting the null hypothesis.

All of this skips over the big ideas about how important "significance" really should be and whether or not null hypothesis significance testing is actually that useful of a tool. Briefly, I always recommend complimenting any significance testing analysis with estimation of effect sizes (for a two-sample t-test, the best effect size estimate will probably be Cohen's d), which can provide some additional context to help you understand your results.

Related helpful post: What is the meaning of a confidence interval taken from bootstrapped resamples?