[Math] In layman’s terms what is the difference between a model and a distribution

probabilitystatisticsterminology

The answers (definitions) defined on Wikipedia are arguably a bit cryptic to those unfamiliar with higher mathematics/statistics.

enter image description here

I am a high school student very interested in this field as a hobby and am currently struggling with the differences between what is a statistical model and a probability distribution

My current, and very rudimentary, understanding is this:

  • statistical models are mathematical attempts to approximate measured distributions (some equation)

  • probability distributions are measured descriptions from experiments that assigns probabilities to each possible outcome of a random event (the actual desired abstract concept itself)

confusion is further compounded by the tendency in literature to see the words "distribution" and "model" used interchangeably – or at least in very similar situations (e.g. binomial distribution vs binomial model)

Can someone verify/correct my definitions, and perhaps offer a more formalized (albeit still in terms of simple english) approach to these concepts?

Best Answer

Strictly speaking, a probability distribution is a function (more precisely, a measure) that assigns to each event some real number in $[0,1]$. Whenever $X$ is a random variable, giving its probability distribution is giving the probabilities attached to the values that $X$ can take. For example if $X$ is the number given when you roll a die, and if $P_X$ is its probability distribution, then you have $P_X(\{1,2\})={1 \over 3}$, $P_X(\{3\})={1 \over 6}$ and so on. For each event, you assign a real number that is the probability of this event. This function is called the probability distribution of $X$.

Now probability distribution is also used is a broader sense, which is closer to the meaning of statistical model. For example, we say $X$ has the binomial distribution. When we say that, what we really mean is $X$ has a binomial distribution, that is: there exists some $n$ and $p$ such that $X\sim Bin(n,p)$. But strictly speaking, if $X\sim Bin(3,0.2)$ and $Y\sim Bin(3,0.4)$, $X$ and $Y$ don't have the same distribution because the probabilities are not the same. However we talk about the binomial distribution.

That is where the concept of statistical model arises. A statistical model is just a set of probability distributions $\mathcal{P}=\{P_{\theta},\theta\in\Theta\}$. For example $\Theta=(0,1)$ and $P_\theta$ is the $Bin(10,\theta)$ distribution. This is a binomial model. This is used in statistics, when you have observations, but you don't know the underlying probability distribution. Thus we make the hypothesis that it belongs to some set of distributions, which is your model, indexed by a parameter $\theta$. Then, we ask ourselves: based on the observations, what can we say about $\theta$, what is the real underlying distribution? Note that we may be wrong. Maybe the real distribution does not belong to this statistical model.

Short Answer

  • A probability distribution is a function that assigns to each event a number in $[0,1]$ which is the probability that this event occurs.
  • A statistical model is a set of probability distributions. We assume that the observations are generated from one of these distributions.