[Math] Intuitive idea behind the probability density function

calculusprobabilitysoft-question

As an application of Calculus, I am currently teaching some material about continuous random variables. My main example is the height $X$ of a French male chosen randomly in the French population.

To explain the probability density function (pdf), I explain that, contrary to discrete variables, knowing $p(X=x)$ is not really interesting (who cares about the probability of being exactly 1.783424567 meter?), but what is of interest is $p(a\leqslant X\leqslant b)$, the probability of being in some interval. So, $p(X=x)$ isn't an interesting one variable function whereas $P(a\leqslant X\leqslant b)=g(a,b)$ is a meaningful function of two variables for probabilities and statistics.

But, instead of studying $g$, we prefer to associate to $X$ a pdf, i.e. a function $f$ such that $p(a\leqslant X\leqslant b)=\int_a^b f(x)\ dx$.

But what is the reason for the introduction of the pdf? Except saying something like "this is more convenient and this is the genius idea of modern probabilities", I had no argument. Could we do probabilities without pdf? What should be a better way to introduce pdf?

I have to precise this is aimed to one variable calculus student, with few knowledge of probabilities (finite variables only) and of course, no idea of what is Lebesgue integral (and how it generalizes both sums $\Sigma$ and Riemann integrals).

Best Answer

As you have pointed out, in continuous probability situations one obtains interesting probability values only for "reasonable" subsets of the event space $\Omega$, say for intervals, circles, rectangles, as the case may be. Now the number of such subsets is huge, so that it is impossible to create a list of all probability values as in the discrete case. And at the same time such a list would be highly redundant, as it would have to satisfy, e.g., $P(A\cup B)+P(A\cap B)=P(A)+P(B)$ for all $A$ and $B$. In your example we automatically have $g(a,c)=g(a,b)+g(b,c)$ when $a<b<c$.

Introducing PDFs is a means of eliminating this redundancy without losing any information. It so happens that in geometrical situations the probability for sets $A\subset\Omega$ of small diameter is roughly proportional to the length (area, volume, spacial angle, etc.) of $A$: $$P(A)\doteq f\cdot{\rm length}(A)\qquad({\rm length}(A)\ll1)\ ,\tag{1}$$ but the proportionality factor $f$ depends on the exact spot $x$ where $A$ is located. The function $x\mapsto f(x)$ created in this way is called the PDF of the random point in question. This means that we should replace the first "Ansatz" $(1)$ by the more refined $$P(A)\doteq f(x)\>{\rm length}(A)\qquad(A\subset B_\epsilon(x),\ \epsilon\ll1)\ .\tag{2}$$ When $A\subset \Omega$ is "large" then $(2)$ immediately leads to $$P(A)\doteq \sum_{k=1}^N f(x_k)\ {\rm length}(A_k)\doteq\int\nolimits_A f(x)\ {\rm d}x\ .$$ Here $A=\bigcup_{k=1}^N A_k$ is a partition of $A$ into tiny subsets (intervals) $A_k$ to which $(2)$ can be applied.

Related Question