Solved – How do Bayesians interpret $P(X=x|\theta=c)$, and does this pose a challenge when interpreting the posterior

bayesianfrequentistinterpretationprobability

I have seen the post Bayesian vs frequentist interpretations of probability and others like it but this does not address the question I am posing. These other posts provide interpretations related to prior and posterior probabilities, $\pi(\theta)$ and $\pi(\theta|\boldsymbol{x})$, not $P(X=x|\theta=c)$. I am not interested in the likelihood as a function of the parameter and the observed data, I am interested in the interpretation of the probability distribution of unrealized data points.

For example, let $X_1,…,X_n\sim Bernoulli(\theta)$ be the result of $n$ coin tosses and $\theta\sim Beta(a,b)$ so that $\pi(\theta|\boldsymbol{x})$ is the pdf of a $Beta(a+\sum x,b + n – \sum x)$.

How do Bayesians interpret $\theta=c$? $\theta$ of course is treated as an unrealized or unobservable realization of a random variable, but that still does not define or interpret the probability of heads. $\pi(\theta)$ is typically considered as the prior belief of the experimenter regarding $\theta$, but what is $\theta=c$? That is, how do we interpret a single value in the support of $\pi(\theta)$? Is it a long-run probability? Is it a belief? How does this influence our interpretation of the prior and posterior?

For instance, if $\theta=c$ and equivalently $P(X=1|\theta=c)=c$ is my belief that the coin will land heads, then $\pi(\theta)$ is my belief about my belief, and in some sense so too is the prior predictive distribution $P(X=1)=\int\theta\pi(\theta)d\theta=\frac{a}{a+b}$. To say "if $\theta=c$ is known" is to say that I know my own beliefs. To say "if $\theta$ is unknown" is to say I only have a belief about my beliefs. How do we justify interpreting beliefs about beliefs as applicable to the coin under investigation?

If $\theta=c$ and equivalently $P(X=1|\theta=c)=c$ is the unknown fixed true long-run probability for the coin under investigation: How do we justify blending two interpretations of probability in Bayes theorem as if they are equivalent? How does Bayes theorem not imply there is only one type of probability? How are we able to apply posterior probability statements to the unknown fixed true $\theta=c$ under investigation?

The answer must address these specific questions. While references are much appreciated, the answers to these questions must be provided. I have provided four Options or proposals in my own solution below as an answer, with the challenges of interpreting $P(X=x|\theta=c)$ as a belief or as a long-run frequency. Please identify which Option in my answer most closely maps to your answer, and provide suggestions for improving my answer.

I am not writing $P(X=x|\theta=c)$ to be contemptuous. I am writing it to be explicit since $P(X=x|Y=y)$ is not the same thing as $P(X=x|Y)$. One might instead be inclined to write in terms of a sample from the prior and use an index of realizations of $\theta$. However, I do not want to present this in terms of a finite sample from the prior.

More generally, how do Bayesians interpret $P(X=x|\theta=c)$ or $P(X\le x|\theta=c)$ for any probability model and does this interpretation pose any challenges when interpreting $P(\theta=s|\boldsymbol{x})$ or $P(\theta\le s|\boldsymbol{x})$?

I've seen a few other posts tackle questions about Bayesian posterior probability, but the solutions aren't very satisfying and usually only consider a superficial interpretation, e.g. coherent representations of information.

Bayesian vs frequentist interpretations of probability

UPDATE:
I received several answers. It appears that a belief interpretation for $P(X=x|\theta=c)$ is the most appropriate under the Bayesian paradigm, with $\theta$ as the limiting proportion of heads (which is not a probability) and $\pi(\theta)$ representing belief about $\theta$. I have amended Option 1 in my answer to accurately reflect two different belief interpretations for $P(X=x|\theta=c)$. I have also suggested how Bayes theorem can produce reasonable point and interval estimates for $\theta$ despite these shortcoming regarding interpretation.

Best Answer

I have posted a related (but broader) question and answer here which may shed some more light on this matter, giving the full context of the model setup for a Bayesian IID model.

You can find a good primer on the Bayesian interpretation of these types of models in Bernardo and Smith (1994), and you can find a more detailed discussion of these particular interpretive issues in O'Neill (2009). A starting point for the operational meaning of the parameter $\theta$ is obtained from the strong law of large numbers, which in this context says that:

$$\mathbb{P} \Bigg( \lim_{n \rightarrow \infty} \frac{1}{n} \sum_{i=1}^n X_i = \theta \Bigg) = 1.$$

This gets us part-way to a full interpretation of the parameter, since it shows almost sure equivalence with the Cesàro limit of the observable sequence. Unfortunately, the Cesàro limit in this probability statement does not always exist (though it exists almost surely within the IID model). Consequently, using the approach set out in O'Neill (2009), you can consider $\theta$ to be the Banach limit of the sequence $X_1,X_2,X_3$, which always exists and is equivalent to the Cesàro limit when the latter exists. So, we have the following useful parameter interpretation as an operationally defined function of the observable sequence.

Definition: The parameter $\theta$ is the Banach limit of the sequence $\mathbf{X} = (X_1,X_2,X_3,...)$.

(Alternative definitions that define the parameter by reference to an underlying sigma-field can also be used; these are essentially just different ways to do the same thing.) This interpretation means that the parameter is a function of the observable sequence, so once that sequence is given the parameter is fixed. Consequently, it is not accurate to say that $\theta$ is "unrealised" --- if the sequence is well-defined then $\theta$ must have a value, albeit one that is unobserved (unless we observe the whole sequence). The sampling probability of interest is then given by the representation theorem of de Finetti.

Representation theorem (adaptation of de Finetti): If $\mathbf{X}$ is an exchangeable sequence of binary values (and with $\theta$ defined as above), it follows that the elements of $\mathbf{X}|\theta$ are independent with sampling distribution $X_i|\theta \sim \text{IID Bern}(\theta)$ so that for all $k \in \mathbb{N}$ we have: $$\mathbb{P}(\mathbf{X}_k=\mathbf{x}_k | \theta = c) = \prod_{i=1}^k c^{x_i} (1-c)^{1-x_i}.$$ This particular version of the theorem is adapted from O'Neill (2009), which is itself a minor re-framing of de Finetti's famous representation theorem.

Now, within this IID model, the specific probability $\mathbb{P}(X_i=1|\theta=c) = c$ is just the sampling probability of a positive outcome for the value $X_i$. This represents the probability of a single positive indicator conditional on the Banach limit of the sequence of indicator random variables being equal to $c$.

Since this is an area of interest to you, I strongly recommend you read O'Neill (2009) to see the broader approach used here and how it is contrasted with the frequentist approach. That paper asks some similar questions to what you are asking here, so I think it might assist you in understanding how these things can be framed in an operational manner within the Bayesian paradigm.

How do we justify blending two interpretations of probability in Bayes theorem as if they are equivalent?

I presume here that you are referring to the fact that there are certain limiting correspondences analogous to the "frequentist interpretation" of probability at play in this situation. Bayesians generally take an epistemic interpretation of the meaning of probability (what Bernardo and Smith call the "subjective interpretation"). Consequently, all probability statements are interpreted as beliefs about uncertainty on the part of the analyst. Nevertheless, Bayesians also accept that the law-of-large-numbers (LLN) is valid and applies to their models under appropriate conditions, so it may be the case that the epistemic probability of an event is equivalent to the limiting frequency of a sequence.

In the present case, the definition of the parameter $\theta$ is the Banach limit of the sequence of observable values, so it necessarily corresponds to a limiting frequency. Probability statements about $\theta$ are therefore also probability statements about a limiting frequency for the observable sequence of values. There is no contradiction in this.

Best Answer

Related Solutions

Solved – How is the bayesian framework better in interpretation when we usually use uninformative or subjective priors

Solved – How is data generated in the Bayesian framework and what is the nature on the parameter that generates the data

Related Question