Marginal Distribution – Difference Between Marginal Likelihood and Likelihood of a Marginal Distribution

joint distributionlikelihoodmarginal-distribution

Assume that we have $n$ i.i.d. samples $x_i \sim f(.|\theta )$, where $\theta$ is a parameter. Furthermore, the parameter is also distributed as $\theta \sim \mathcal{g}(.|\alpha) $.

Now let's say that we can derive a marginal distribution:
$$p(x|\alpha) = \int f(x|\theta)\mathcal{g}(\theta) d\theta$$

The likelihood function is then
$$L_1(\alpha| x) = \prod_{i=1}^n p(x_i|\alpha) $$

Alternatively, we can use a marginal likelihood:

$$L_2(\alpha| x) = \int \left(\prod_{i=1}^n f(x_i|\theta)\right) \mathcal{g}(\theta)d\theta$$

What's the difference between the two approaches?

Best Answer

The problem here is that although the observations $x_1,...,x_n$ are independent conditional on $\theta$, they are not independent conditional on $\alpha$ instead, so as a general rule:

$$f(\mathbf{x}_n|\alpha) \neq \prod_{i=1}^n f(x_i|\alpha).$$

Indeed, marginalising over a distribution on $\theta$ will tend to induce positive correlation between $x_1,...,x_n$ (see e.g., O'Neill 2009, Theorem 2, p. 244). Consequently, your $L_1$ is not a valid likelihood function in this case --- the function $L_2$ is the correct likelihood function.

If you would like to learn more about this issue, I recommend you read O'Neill (2009). That paper examines the marginal correlation between observations under an assumption of conditional independence in Bayesian analysis. As shown in the paper, this tends to lead to positive correlation between the observations --- a phenomenon the paper dubs "Bayes' effect". This paper goes into quite a bit of detail about the issue of marginal correlation induced by a model of conditional independence, and it also discusses the comparative approaches of Bayesian and frequentist methods in drawing inferences in these cases. It applies directly to your problem here and I think you would find it illuminating.

Related Question