Probability – Distinguishing Between Known Quantities and Random Variables

effect-sizenormal distributionprobabilityrandom variable

For two random variables $X_1$, $X_2$, probability of superiority ($PS$) is defined as the probability that a randomly chosen $X_1$ is greater than a randomly chosen $X_2$:

$PS \equiv Pr(X_1 > X_2) \tag{1}$

Suppose $X_1$, $X_2$ follow independent normal distributions with known parameters.

Since the parameters are known,

$PS = \Phi\left(\frac{\mu_1 – \mu_2}{\sqrt{\sigma_1^2 + \sigma_2^2}}\right) \tag{2}$

Specifically, because distribution parameters are known, $PS$ is a single known quantity, not a random variable.

Suppose now that the distribution parameters are not known, but we observe some data on $X_1$, $X_2$. My objective is to find $PS$ conditional on the data. How do we calculate $PS$? I can see two approaches.

Approach 1. The distribution parameters conditional on the data have known sampling distributions (normal for $\mu$ and chi-square for $\sigma^2$). Draw parameters from these known sampling distributions. For each set of parameters drawn, calculate $PS$ per $(2)$.

This gives us a bootstrapped distribution of $PS$. $PS$ is a random variable with a distribution.

Approach 2. Per the definition $(1)$, we are actually interested in the distributions of future data, not in the sampling distributions of the parameters. Conditional on the data (not the parameters), $X$'s have a known distribution. The posterior predictive of each $X$ is the 3-parameter Student t distribution (see here, equation 100).

We could work out the difference between two random variables that have 3-parameter Student t distributions, and the result will be similar to $(2)$ but with heavier tails and, instead of having parameters in the equation, it will have statistics.

Under this approach, because exact distributions of $X_1$, $X_2$ conditional on the data are known, $PS$ turns out to be a single known quantity, not a random variable, just like when distribution parameters were known.

Question. Which approach is correct?

It feels to me like Approach 2 is correct, as it integrates out distribution parameters, which are not even part of the definition of $PS$. The "problem" with this approach is that since $PS$ is known, it has no confidence interval and one cannot do hypothesis tests on it.

Best Answer

There is no essential difference between the two approaches. If we define your unknown parameters as $\theta$ ($\mu_1,\mu_2,\sigma^2_1,\sigma^2_2$), then in your first approach you calculate the conditional probability:

$$ P(X_1 > X_2 | \theta )$$

and then sample $\theta$ to obtain a distribution, while in the second approach you calculate the marginal probability :

$$ P(X_1 > X_2) = \int d\theta \pi(\theta)P(X_1 > X_2 | \theta ) $$

If you consider $\theta$ as a random variable, then the conditional probability is a random variable as well, while the marginal probability is the expectation of it.

Related Question