Statistical Model – Differences Between a Statistical Model and a Probability Model

mathematical-statisticsprobability

Applied probability is an important branch in probability, including computational probability. Since statistics is using probability theory to construct models to deal with data, as my understanding, I am wondering what's the essential difference between statistical model and probability model? Probability model does not need real data? Thanks.

Best Answer

A Probability Model consists of the triplet $(\Omega,{\mathcal F},{\mathbb P})$, where $\Omega$ is the sample space, ${\mathcal F}$ is a $\sigma$−algebra (events) and ${\mathbb P}$ is a probability measure on ${\mathcal F}$.

Intuitive explanation. A probability model can be interpreted as a known random variable $X$. For example, let $X$ be a Normally distributed random variable with mean $0$ and variance $1$. In this case the probability measure ${\mathbb P}$ is associated with the Cumulative Distribution Function (CDF) $F$ through

$$F(x)={\mathbb P}(X\leq x) = {\mathbb P}(\omega\in\Omega:X(\omega)\leq x) =\int_{-\infty}^x \dfrac{1}{\sqrt{2\pi}}\exp\left({-\dfrac{t^2}{2}}\right)dt.$$

Generalisations. The definition of Probability Model depends on the mathematical definition of probability, see for example Free probability and Quantum probability.

A Statistical Model is a set ${\mathcal S}$ of probability models, this is, a set of probability measures/distributions on the sample space $\Omega$.

This set of probability distributions is usually selected for modelling a certain phenomenon from which we have data.

Intuitive explanation. In a Statistical Model, the parameters and the distribution that describe a certain phenomenon are both unknown. An example of this is the familiy of Normal distributions with mean $\mu\in{\mathbb R}$ and variance $\sigma^2\in{\mathbb R_+}$, this is, both parameters are unknown and you typically want to use the data set for estimating the parameters (i.e. selecting an element of ${\mathcal S}$). This set of distributions can be chosen on any $\Omega$ and ${\mathcal F}$, but, if I am not mistaken, in a real example only those defined on the same pair $(\Omega,{\mathcal F})$ are reasonable to consider.

Generalisations. This paper provides a very formal definition of Statistical Model, but the author mentions that "Bayesian model requires an additional component in the form of a prior distribution ... Although Bayesian formulations are not the primary focus of this paper". Therefore the definition of Statistical Model depend on the kind of model we use: parametric or nonparametric. Also in the parametric setting, the definition depends on how parameters are treated (e.g. Classical vs. Bayesian).

The difference is: in a probability model you know exactly the probability measure, for example a $\mbox{Normal}(\mu_0,\sigma_0^2)$, where $\mu_0,\sigma_0^2$ are known parameters., while in a statistical model you consider sets of distributions, for example $\mbox{Normal}(\mu,\sigma^2)$, where $\mu,\sigma^2$ are unknown parameters.

None of them require a data set, but I would say that a Statistical model is usually selected for modelling one.

Related Question