Why aren’t these sentences about confidence intervals equivalent?

confidence interval

Yes, I'm aware that there's similar/duplicated questions already open:

But funny enough, NONE of these questions has any answer marked as the right one, so this question is, technically, still unsolved in this site. I have read the answers and other related questions but I don't exactly see the difference yet so I will try to explain why I think these are equivalent expressions.

Given a confidence interval $CI_D$ for some unknown fixed parameter $\theta$ calculated from a dataset $D$, why aren't these three sentences equivalent?

  1. If I repeat the experiment an infinite amount of times and I get an infinite amount of calculated confidance intervals, 95% of them will contain $\theta$.
  2. There's a 95% probability that $CI_D$ contains $\theta$.
  3. There's a 95% probability that $\theta$ is in $CI_D$.

Why I think these sentences are equivalent?

  • Sentences 1 and 2 are equivalent for me because I don't know if $CI_D$ is one of the 95% of intervals that contains $\theta$, which is the same as saying that $CI_D$ has a probability of 95% of being one of intervals that contains $\theta$, which is the same as saying that there's a probability of 95% that $CI_D$ contains $\theta$.

  • Senteces 2 and 3 are equivalent because "$CI_D$ contains $\theta$" is equivalent to say "$\theta$ is in $CI_D$", because both sentences are translated as $\theta\in CI_D$. So $P[CI_D\ contains\ \theta] = 0.95$ is the same as saying $P[\theta\ is\ in\ CI_D] = 0.95$, because they are both the same as saying $P[\theta\in CI_D]=0.95$.

Am I right?

I know that frequentists doesn't allow to say "probability of fact around $X$" when $X$ is not a random variable (and $\theta$ is not because it's a constant), but $CI_D$ is and 2 and 3 speak about the probability of the relationship between $\theta$ and $CI_D$. So I'm not fully convinced that "probability of $\theta\in CI_D$" goes against the fact that $\theta$ is not a random variable, because $CI_D$ is a random variable and is also present in the same sentence (it tells something about $\theta$, but it also tells something about the $CI_D$).

Best Answer

The events in statements 2 and 3 are obviously equivalent – I interpret them as $CI_D \ni \theta$ and $\theta \in CI_D$ respectively. The issue here is that you are vague about whether you are talking about CIs as random intervals or as fixed intervals after the observed data has been substituted, and you are also vague about whether you are talking about conditional or unconditional probability. Below I will show which mathematical statements about confidence intervals are true/false. So long as you describe these statements correctly in a textual sense (which requires more explicit specification of some issues you're glossing over) you should be fine.


Probabilistic properties of the CI: I'll conduct a purely probabilistic analysis of confidence intervals as mathematical objects, so I'll examine probability statements applying to these objects that are both conditional and unconditional on $\theta$. Note that in the classical framework, the parameter is treated as an "unknown constant" so we (implicitly) condition on it in all probability statements in that context. Nevertheless, I'll look at things more broadly so that you can see what probabilistic statements are true/false within a generalised framework where you examine the CI on a purely mathematical basis.

In order to show you what statements about confidence intervals are true/false, we will use more detailed notation. Let $\text{CI}_\theta(\mathbf{X}, \alpha)$ denote the $1-\alpha$ level confidence interval for $\theta \in \Theta$ using (random) data vector $\mathbf{X}$. This object is a mapping $\text{CI}_\theta: \mathbb{R}^n \times [0,1] \rightarrow \mathfrak{p}(\mathbb{R})$ that maps an input data vector and significance value to a measurable subset of the real numbers. (For a confidence interval the output of the function is a single connected interval, but you can generalise to use confidence sets if you want to remove this restriction.) As I've noted in several other answers (some for questions you link to), an exact confidence interval is defined by the following property:

$$\mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha) | \theta) = 1-\alpha \quad \quad \quad \quad \text{for all } \theta \in \Theta.$$

(An approximate confidence interval is one where there is approximate equality, usually relying on asymptotic distributional results.) Substituting the observed data $\mathbf{X}=\mathbf{x}$ then gives the (fixed) confidence interval $\text{CI}_\theta(\mathbf{x}, \alpha)$. To allow us to assess statements about "repeated experiments" we will let $\mathbf{X}_1, \mathbf{X}_2, \mathbf{X}_3, ...$ denote a sequence of IID random vectors with distribution equivalent to the random vector $\mathbf{X}$.

So, assuming you are using an exact confidence interval, the following statements are true/false$^\dagger$:

$$\begin{align} \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha) | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{X}, \alpha) \ni \theta | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha)) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{X}, \alpha) \ni \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] -------------&---------------- \\[6pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha) | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{x}, \alpha) \ni \theta | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha)) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{x}, \alpha) \ni \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] -------------&---------------- \\[6pt] \mathbb{P} \bigg( \lim_{k \rightarrow \infty} \frac{1}{k} \sum_{i=1}^k \mathbb{I}(\theta \in \text{CI}_\theta(\mathbf{X}_i, \alpha)) &= 1-\alpha \bigg| \theta \bigg) = 1 \quad \quad \quad \quad \ \ \text{True} \\[6pt] \mathbb{P} \bigg( \lim_{k \rightarrow \infty} \frac{1}{k} \sum_{i=1}^k \mathbb{I}(\theta \in \text{CI}_\theta(\mathbf{X}_i, \alpha)) &= 1-\alpha \bigg) = 1 \quad \quad \quad \quad \quad \ \text{True} \\[6pt] \end{align}$$

If you are working in the classical ("frequentist") context, you can ignore the marginal probability statements here and focus entirely on the conditional probability statements. (In that context the parameter is an "unknown constant" and so all our probabilistic analysis implicitly conditions on it having a fixed value.) As you can see, the remaining distinction that determines whether the statement is true/false is whether you are talking about the "data" in its random sense or fixed sense. You also need to take care to state these mathematical conditions clearly and accurately.


$^\dagger$ Statements listed as $\text{False}$ are statements that are not true in general. These statements may be true "coincidentally" for some specific values of the inputs.

Related Question