Solved – How is data generated in the Bayesian framework and what is the nature on the parameter that generates the data

bayesianfrequentistmodelingpriorrandomness

I was trying to re-learn Bayesian statistics (every time I thought I finally got it, something else pops out that I didn't consider earlier….) but it wasn't clear (to me) what the data generation process in the Bayesian Framework actually is.

The frequentist framework its clear to me. There is some "true" parameter(s) $\theta$ and that parameter generates the data according to the distribution that it parametrizes.

However, in the Bayesian setting, we model the parameter as a random variable. That part does not confuse me. It makes sense, because a Bayesian interprets this probability as the uncertainty in its own beliefs. They are ok with assigning a probability to nonrepeatable events. So the way that I interpreted "Bayesianism" was that, it believe that there is some parameter generating the data, it definitively is unknown but nevertheless, fixed once it was decided by "nature" (and maybe nature did decide randomly what it was supposed to be). Nevertheless, it is fixed and hence it creation was a "nonrepeatable event". Even though it was nonrepeatable, we are only trying to update our own belief of $\theta$ given data. Therefore, the data might have been generated by any of the parameters under consideration by our probability distribution (prior), but nevertheless, the parameter is fixed and unknown. We are just attaching a probability value to it.

With this view, it makes sense to me to assume that the data generation process is nearly identical to the frequentist one. "Nature" selects the parameter $\theta$ using the "true" "prior" distribution $P^*(\theta)$ and once the random variable takes its "true" (but fixed) realization, it starts generating the data that we observe.

Is this the standard way to interpret the data generation process for in the Bayesian framework?

The main thing about my view is that, the parameter $\theta$ is definitively fixed (viewed as a realization of a r.v.), and it generates the data according to $\theta$. Hence, another very important point on my view is that, for me, that our prior is only a quantifiable way of expressing our uncertainty on the fixed (and nonrepeatable) event of creating the parameter $\theta$. Is that how people interpret the prior $P(\theta)$?


Side humorous note:

I wish could just ask "Nature" how she is doing it and settle this once and for all … lol.

Best Answer

It is pretty straightforward: there are no differences between Bayesians and frequentists regarding the idea of the data-generating model.

To understand this, consider first that the data-generating model is mathematically encoded in the likelihood, which is the basis for the inference of Bayesians and frequentists alike. And there is zero difference between a Bayesian and frequentist likelihood.

Now, you could say: that doesn't mean that Bayesians think that the parameters of the data-generating process are fixed. Sure, but really, it makes very little sense to think otherwise - what would be the point of estimating a quantity that is not fixed? What would that even mean mathematically? Of course, it could be that you have a quantity that is not a value, but a distribution. But then you estimate the distribution, so it is fixed again.

The real difference, as @Xi'an says, is not in the assumption about how our data is generated, but in the inference. So, when you say

However, in the Bayesian setting, we model the parameter as a random variable.

I would disagree - we model our knowledge / uncertainty about the true parameter as a random variable - that is the subtle, but important difference - we treat the parameter as random variables to explore our uncertainty about their "true" value.

Related Question