Definition of “Random Parameters” in Probability

probabilitystatistics

In the context of Probability, I am trying to understand what is meant by a "Fixed vs Random Parameter".

For example – consider the problem of statistical estimation:

In the Bayesian context, I am having difficulty understanding the rationale behind why the "sample is considered as random".

For example – consider the formula for estimating some parameter under the Bayesian Framework:​

We usually express this relationship as follows: $$p(\theta \mid x) = \frac{p(x \mid \theta)p(\theta)}{\int_{\Theta} p(x \mid \theta)p(\theta) d\theta}$$

  • $p(\theta \mid x)$ is the posterior distribution
  • $p(x \mid \theta)$ is the likelihood
  • $p(\theta)$ is the prior distribution
  • $\int_{\Theta} p(x \mid \theta)p(\theta) d\theta$ is the normalizing constant to ensure that the sum of all possible values of the posterior is 1.

I can understand why we consider the "parameter" (i.e. "thetha") as "random" – but why do we consider the "data" as "fixed"?

The Bayesian Estimate is a function of the Frequentist Likelihood – and in the Frequentist Estimate, the "data" is considered as random. Therefore, in the Bayesian context – would it not be more accurate to say "the data is random AND the parameter is also random"?

Thanks!

Best Answer

Let $\Theta$ be the parameter space and $\mathcal{X}$ be the space of possible obervations.

A Bayesian model is a probability distribution over $\mathcal{X} \times \Theta$. This is usually written as a likelihood $f(x|\theta)$ and a prior $\pi(\theta)$, but it should be clear that these components exactly encode the joint distribution of the observations and parameters.

In this sense, both parameters and observations are random, and indeed, Bayes rule relies on the observations being realisations of random variables. For this reason, I think that the "observations are fixed" statement is overstating the way in which Bayesian inference is somehow dual to frequentist inference.

The statement isn't without merit, though. The object of Bayesian inquiry is the posterior $\pi(\theta|x)$, where $x$ is the particular data that was observed in the experiment, not an abstract random variable. Although we had to think of the observations as random to determine this distribution on $\Theta$, once we have it, we never have to think about other values of $x$ again - only the observed ones matter. For example, as Bayesians we might compute the posterior mean of $\theta$ $$ \int_{\Theta} \theta\cdot \pi(\theta|x) \, \mathrm{d} \theta, $$ in which we fix $x$ and vary $\theta$ over the whole parameter space. Essentially all Bayesian inference is similar to this.

Compare this to frequentim, where a common object of inference is the likelihood function $f(y; \hat{\theta}(x))$, where $\hat{\theta}(x)$ is the MLE of $\theta$. Now, the only parameter value that we need to consider is $\hat{\theta}(x)$, but we still need other values in $\mathcal{X}$. For example, a frequentist might want to compute the observed Fisher information $$ -\int_{\mathcal{X}} \nabla^2_\theta \log f(y; \hat{\theta}(x))\cdot f(y; \hat{\theta}(x))\, \mathrm{d}y, $$ in which it is now the parameter value that is fixed at $\hat{\theta}(x)$ and the observations $y$ that are varied over the whole observation space.

Related Question