[Math] the difference between a density and a distribution in formal mathematical terms

measure-theoryprobability theoryrandom variables

A similar question has been already asked but its not in mathematical framework and therefore seems to be different. According to definitions from the book that I am reading, a random variable and a distribution are defined as follows:

Definition. Let $(\Omega', \mathcal{A}')$ be a measurable space and let $X:\Omega\to\Omega'$ be measurable. Then $X$ is called a random variable.

Definition. Let $X$ be a random variable. The probability measure $P_X:=P\circ X^{-1}$ is called the distribution.

Now according to what I see in physical textbooks there is some other thing called density and that differs from distribution. How that one is formally defined?

Best Answer

The distribution is simply the assignment of probabilities to sets of possible values of the random variable. If I tell you how probable it is that a certain random variable is between $3$ and $5$, and also how probably it is that it's in every other possible set, then I've told you the distribution. Since I can't do this for every set individually, since there are infinitely many sets, perhaps a more down-to-earth way to say this is this: Suppose $X$ and $Y$ are random variables. If it is true of every set that the probability that $X$ is in that set is the same as the probability that $Y$ is in that same set, then $X$ and $Y$ have the same distribution.

A probability density function is a way of characterizing some distributions. For example, consider the function $$ f(x) = \begin{cases} 0 & \text{if }x<0, \\ e^{-x} & \text{if }x\ge 0. \end{cases} $$ To say that this is the probability density function of a random variable $X$ is to say that for every measurable set $A$ of real numbers, $$ \Pr(X\in A) = \int_A f(x)\,dx. $$ The probability assigned to each set $A$ is given by the integral above. A more concrete example: $$ \Pr(3<X<5) = \int_3^5 e^{-x}\,dx\text{ and }\Pr(X\ge 2) = \int_2^\infty e^{-x}\,dx. $$

Not every probability distribution has a density. Say we let $X$ be the number of aces when a die is thrown four times. Then $X\in\{0,1,2,3,4\}$. The probability distribution assigns a positive number to every set that intersects that last set. For example the set $\{x : x\ge 3.2\}$ intersects $\{0,1,2,3,4\}$ and thus the probability distribution of $X$ assigns a positive number to that set. But there is no function $f$ such that for every set $A$ we have $\int_A f(x)\,dx$ equal to the probability that $X\in A$.

PS prompted by comments below: To put it in a different kind of language: Say $m$ is a measure (not necessarily assigning finite measure to the whole space) on the set of all measurable subsets of a space $S$. A probability density with respect to the measure $m$ is a measurable function $f:S\to[0,\infty)$ such that the function $$ A\mapsto \int_A f\,dm $$ is a probability measure on the set of measurable subsets of $S$.

A probability distribution on $S$ is simply a probability measure on the set of all measurable subsets of $S$. But not quite "simply": The probability distribution of a random variable $X:\Omega\to S$ is the probability measure on measurable subsets of $S$ that assigns measure $P(\{\omega\in\Omega : X(\omega)\in A\})$ to each measurable subset $A$ of $S$.

PPS: When $f\ge0$ is a measurable function on Borel or Lebesgue-measurable subsets of $\mathbb R$, one sometimes refers to the "measure" $f(x)\,dx$, meaning the measure $$ A\mapsto \int_A f(x)\,dx. $$ If in addition $\displaystyle\int_{\mathbb R} f(x)\,dx=1$, so that $f$ is a probability density, then one may similarly refer to the "probability distribution" $f(x)\,dx$.

(Of course, not all probability distributions on Borel subsets of the real line are of this form.)

Related Question