[Math] What does it mean to take an integral of a probability

probabilitystatistics

My understanding is that you have some function $y=f(x)$ to represent a probability density function, correct? For instance for a uniform random variable it looks like a giant rectangular block. I don't really know what a density function tells you but for a uniform variable it's $1\over(b-a)$. I assumed this meant "the probability that you 'land' in this area is $1\over(b-a)$" but then I read that taking the integral of the density function gives you the probability?

And then somehow I see that sometimes you can take the integral of the probability function, for example if you want to know the probability that the sum of n random variables exceeds x, it requires taking the integral of a probability function and so on and so forth. For example see http://mathworld.wolfram.com/UniformSumDistribution.html after the line "while the sum of $n-1$ variates being less than 1 is."

I am getting lost in my understanding of what the integral of this and that represents. Is there a simple way to understand what is what, here?

Best Answer

Let me address one piece of your question, namely "I don't really know what a density function tells you". If $p$ is the probability density function of a random variable $X$, then the intuitive meaning of $p(c)$ would be described by a physicist (or an 18th century mathematician) as follows: For an infinitesimal interval containing $c$, consider the probability that $X$ lies in that interval, and consider the length of the interval. The ratio of these is $p(c)$. The decision to use a ratio here is based on the intuitive idea that, if you double the length of an interval near $c$, then that doubles the probability that $X$ will be in the interval --- a double-sized target is twice as easy to hit. And similarly if you replace "double" with "triple" or other factors. So it is reasonable to expect that (for well-behaved, continuously distributed $X$) the ratio that defines $p(c)$ will not depend (much) on the exact size of the infinitesimal interval. The probability of hitting the interval does depend sensitively on its length, but we compensate for that by dividing by the interval's length. So the probability density is the probability (of $X$ being in an interval near $c$) per unit length.

In the 19th century, mathematicians wanted more rigorous concepts, not involving infinitesimals, so this intuitive definition would be replaced with a definition involving a limit, namely that $p(c)$ is the limit, as $\epsilon\to0$, of the ratio $\text{Prob}(c-\epsilon<X<c+\epsilon)/2\epsilon$. [Technicalities: I should probably also have allowed intervals around $c$ that aren't centered exactly at $c$, but for well-behaved continuous random variables it won't matter.] [In the 20th century, infinitesimals were rehabilitated by nonstandard analysis. So we could revert to the intuitive definition that I gave first, except that we should replace "the ratio" with "the standard part of the ratio.]

I suggest that, in order to see what a probability density function does, you ignore the technicalities, ignore (at least for a while) the epsilon definition, and meditate on the 18th century definition.

EDIT, after the comment asking why we integrate $p$ to get the probability of hitting an interval: Still in the style of physics or 18th-century math, I'd get the probability of hitting an interval $[a,b]$ by imagining this interval cut up into infinitely many infinitesimal pieces, and then adding the probabilities of hitting any of those pieces. Such a sum of infinitely many infinitesimal pieces is exactly what an integral is. To see the connection with areas, imagine computing the area bordered by the graph of a positive function $f$, the $x$-axis, and the vertical lines $x=a$ and $x=b$. This area, given by $\int_a^bf(x)\,dx$, can be considered (in the 18th century) as follows. Chop up the interval $[a,b]$ into infinitely many infinitesimal intervals and consider the rectangles that have these infinitesimal intervals as bases and have heights given by $f$. (That is, an interval around $x$ has height $f(x)$.) Then the total area over $[a,b]$ is the sum of the infinitely many infinitesimal areas of these thin rectangles. The moral of this story is that integration, as in $\int_a^bf(x)\,dx$, is exactly adding up an infinite bunch of infinitesimal pieces, each of which is a product of a genuine number (like the height $f(x)$ of these rectangles) and an infinitesimal (the width of the rectangles). (In fact, the integral sign $\int$ was originally a fancy $S$, standing for "sum"; people thought of it as the sum of the products $f(x)$ times $dx$.) Coming back to probability, you have the analogous situation --- summing up infinitely many products of ordinary numbers $p(x)$ times infinitesimal widths of intervals. So the mathematical situation is the same as with areas; and so the method of computing these things is also the same, integration.

Related Question