Probability – Intuition About the Definition of Probability Density Function

density functionintuitionprobabilityprobability theorystatistics

My math book and lector uses the following definition for a PDF,

A function p from interval $I ⊆ \mathbb{R}$, that follows

$$\int_{l}p(x)dx = \int_{-\infty}^{\infty}1(x \in I )p(x)dx=1$$
is called a Probability Density function of I in this case.

But im not sure i understand the intuition about this , i know the area under the integral is supposed to be 1, and i can see from the definition that it uses an indicator function. But what is p(x) in this case and why do we multiply it by the indicator function?

Many times it seems people just define it as this $$\int_{-\infty}^{\infty}p(x)dx=1$$

Without the indicator function, and just p(x) .

I searched up on google and it seems like no one else uses this definition, so i hope someone knowledgeable in this area can help me understand it a bit better.

Best Answer

The interval $I$ you refer to is commonly called the support of the random variable $X$ and is denoted as $I=\operatorname{supp}(X)$. The support simple refers to the region (in this case subset of the real line) where the density of $X$ takes on positive values. Even if a probability density has support on only a subset of the real line, e.g. $X\sim\operatorname{Beta}(\alpha,\beta)$ implies $\operatorname{supp}(X)=[0,1]$, we can always define it as a density on the entire real line with the use of indicator functions. Going back to our example, let $f_X(x;\alpha,\beta)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{\operatorname B(\alpha,\beta)}$ denote the beta density function for which $\int_0^1f_X(x;\alpha,\beta)\,\mathrm dx=1$. By redefining our density function as $f_X(x;\alpha,\beta)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{\operatorname B(\alpha,\beta)} 1_{x\in[0,1]}$ we may formally extend our density to the entire real line without changing any of its properties. In particular, we can now rightfully claim $$ \int_{-\infty}^\infty f_X(x;\alpha,\beta)\,\mathrm dx=1, $$ since $1_{x\in[0,1]}=0$ outside the interval $[0,1]$.

Edit:

This is meant to further clarify the OP's follow-up questions. In many statistics texts it is common to see the author(s) state the property $$ \tag{1} \int_{-\infty}^\infty p(x)\,\mathrm dx=1, $$ where $p(x)$ is some arbitrary probability density function. Note that such an expression does not explicitly mention the support of the density function $p(x)$. So despite the integral in $(1)$ taking place over the entire real line, it is implicitly assumed that all contributions to the integral outside the densities support are zero. For example, if $p(x)$ is the normal density function then $p(x)$ has support everywhere on the real line and so $(1)$ is simply telling us $$ \int_{-\infty}^\infty p(x)\,\mathrm dx=\frac{1}{\sqrt{2\pi}\sigma}\int_{-\infty}^\infty \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\,\mathrm dx=1. $$ If, however, the density has support on only a subset of the real line then the expression given by $(1)$ assumes there are "instructions" baked into the equation for the density function that specify where to integrate. Take for example our beta distribution again. Letting $p(x)$ be the $\operatorname{Beta}(\alpha,\beta)$ density the equation $(1)$ only makes sense if we write $$ \int_{-\infty}^\infty p(x)\,\mathrm dx=\int_{-\infty}^\infty \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\operatorname B(\alpha,\beta)} 1_{x\in[0,1]}\,\mathrm dx =\int_0^1 \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\operatorname B(\alpha,\beta)}\,\mathrm dx=1. $$

Related Question