Confidence Intervals vs Hypothesis Testing – Differences Explained

confidence intervalhypothesis testing

I have read about controversies regarding hypothesis testing with some commentators suggesting that hypothesis testing should not be used. Some commentators suggest that confidence intervals should be used instead.

  • What is the difference between confidence intervals and hypothesis testing? Explanation with reference and examples would be appreciated.

Best Answer

You can use a confidence interval (CI) for hypothesis testing. In the typical case, if the CI for an effect does not span 0 then you can reject the null hypothesis. But a CI can be used for more, whereas reporting whether it has been passed is the limit of the usefulness of a test.

The reason you're recommended to use CI instead of just a t-test, for example, is because then you can do more than just test hypotheses. You can make a statement about the range of effects you believe to be likely (the ones in the CI). You can't do that with just a t-test. You can also use it to make statements about the null, which you can't do with a t-test. If the t-test doesn't reject the null then you just say that you can't reject the null, which isn't saying much. But if you have a narrow confidence interval around the null then you can suggest that the null, or a value close to it, is likely the true value and suggest the effect of the treatment, or independent variable, is too small to be meaningful (or that your experiment doesn't have enough power and precision to detect an effect important to you because the CI includes both that effect and 0).

Added Later: I really should have said that, while you can use a CI like a test it isn't one. It's an estimate of a range where you think the parameter values lies. You can make test like inferences but you're just so much better off never talking about it that way.

Which is better?

A) The effect is 0.6, t(29) = 2.8, p < 0.05. This statistically significant effect is... (some discussion ensues about this statistical significance without any mention of or even strong ability to discuss the practical implication of the magnitude of the finding... under a Neyman-Pearson framework the magnitude of the t and p values is pretty much meaningless and all you can discuss is whether the effect is present or isn't found to be present. You can never really talk about there not actually being an effect based on the test.)

or

B) Using a 95% confidence interval I estimate the effect to be between 0.2 and 1.0. (some discussion ensues talking about the actual effect of interest, whether it's plausible values are ones that have any particular meaning and any use of the word significant for exactly what it's supposed to mean. In addition, the width of the CI can go directly to a discussion of whether this is a strong finding or whether you can only reach a more tentative conclusion)

If you took a basic statistics class you might initially gravitate toward A. And there may be some cases where it is a better way to report a result. But for most work B is by far and away superior. A range estimate is not a test.