How do Bayesians interpret unobservable model parameters

bayesianinterpretationprobability

This is designed as a canonical question relating closely to some other questions asked on this site (see e.g., here, here, here). In this canonical question I will try to get to the crux of the issue more clearly than in other posted questions.

Suppose we have a simple IID Bayesian model, with an observable (infinite) sequence of data $\mathbf{x} = (x_1,x_2,x_3,…)$ and we observe a sample of $n$ values. If we denote the sampling density for a single observation as $f_\theta$ (given a parameter $\theta$) and the prior as $\pi$ then we have the Bayesian model:

$$\begin{align}
x_1,…,x_n | \theta &\sim \text{IID } f_\theta, \\[6pt]
\theta &\sim \pi. \\[6pt]
\end{align}$$

If we have this kind of simple IID model, what is the proper interpretation of the parameter $\theta$? Does it correspond to anything in an operational sense, or is it just a model parameter existing in the hypothetical mathematical ether?

Best Answer

You can find a good primer on the Bayesian interpretation of these types of models in Bernardo and Smith (1994). In that work they take an "operational" approach where model parameters are interpreted as limiting quantities that are functions of the observable sequence. You can also find a more detailed discussion of these particular interpretive issues in O'Neill (2009), which extends the operational interpretation to ensure that the parameter exists and corresponds to a limiting quantity under all possible sequence values.

Before getting to the interpretational side, it is important to note where the IID model comes from in Bayesian analysis. Given an infinite sequence $\mathbf{x}$ we can define the limiting empirical distribution $F_\mathbf{x}: \mathbb{R} \rightarrow [0,1]$ as the Banach limit that extends the following Cesàro limit:

$$F_\mathbf{x}(x) \equiv \lim_{n \rightarrow \infty} \frac{1}{n} \sum_{i=1}^n \mathbb{I}(x_i \leqslant x) \quad \quad \quad \quad \quad \text{for all } x \in \mathbb{R}.$$

Now, an important result connecting the probability of the observable values to the underlying model parameters is the celebrated "representation theorem" from de Finetti (later extended by Hewitt and Savage). The version I show here is an adapted version shown in O'Neill (2009) (p. 242, Theorem 1). For this version, we show the decomposition of the marginal distribution of the sample vector $\mathbf{x}_n = (x_1,...,x_n)$. As with all versions of the theorem, the exchangeability of the underlying sequence leads to the IID model and the parameter-observation connection.

Representation theorem: If the sequence $\mathbf{x}$ is exchangeable then it follows that the elements of $\mathbf{x}|F_\mathbf{x}$ are independent with sampling distribution $F_\mathbf{x}$ (i.e., the sampling distribution is the empirical distribution of $\mathbf{x}$) so that for all $n \in \mathbb{N}$ we have:

$$F(\mathbf{x}_n) = \int \prod_{i=1}^n F_\mathbf{x}(x_i) \ dP (F_\mathbf{x}).$$

This theorem essentially says that if the observable sequence $\mathbf{x}$ is exchangeable, then we have the following IID model:

$$\begin{align} x_1,...,x_n | F_\mathbf{x} &\sim \text{IID } F_\mathbf{x}, \\[6pt] F_\mathbf{x} &\sim \pi. \\[6pt] \end{align}$$

Now, in many applications, we will make the additional assumption that the observations obey some other invariance constraints that lead us to a particular parametric family of distributions. In this case, it may be possible to index the empirical distribution $F_\mathbf{x}$ by a parameter vector $\theta \in \Theta$ (i.e., we have a mapping $F_\mathbf{x} \mapsto \theta$ that defines the index and the model is restricted to empirical distributions corresponding to a value of $\theta$). In this case, we would write the IID model as:$^\dagger$

$$\begin{align} x_1,...,x_n | \theta &\sim \text{IID } f_\theta, \\[6pt] \theta &\sim \pi. \\[6pt] \end{align}$$

So, as you can see, the setup for the Bayesian IID model occurs when we have an exchangeable sequence of observable values, and we then see that the model "parameter" is an index to the empirical distribution for the observable sequence (which can be defined through the Banach limit extending the above Cesàro limit). This "index" is a function of the empirical distribution, which is in turn a function of the observable sequence, so there exist mappings $\mathbf{x} \mapsto F_\mathbf{x} \mapsto \theta$.

Interpretation of the parameters: In the above setup, there exists a mapping $\mathbf{x} \mapsto \theta$, and so it is natural to take this as the "definition" of the parameter $\theta$. Under this approach, the parameter $\theta$ has an "operational" meaning as a quantity that is fully determined by the observable sequence (i.e., it is a limiting quantity on the observed sample as $n \rightarrow \infty$). Note that this interpretation relates closely to the strong law of large numbers.

$^\dagger$ I am using a slight abuse of notation here by taking $\pi$ as a generic reference to a prior distribution for whatever parameter is under use. Note that the prior for $\theta$ would be a simple mapping of the prior for $F_\mathbf{x}$.

Best Answer

Related Solutions

Solved – How do Bayesians interpret $P(X=x|\theta=c)$, and does this pose a challenge when interpreting the posterior

Related Question