Solved – Confused about Probability Density Function and Cumulative Density Function

distributionsnormal distributionprobability

When computing probabilities, do we use probability density function or cumulative density function for continuous values?

And I heard that if we have a cumulative density function for a set of continuous values, we can get a probability for a specific value, but we cannot get a probability for a specific value with a probability density function but we can get a probability between intervals with a probability density function if we do not know the cumulative density function.

Say, I have data which follows a normal distribution. To get a probability for a specific value, what should I use between the probability density function and the cumulative density function for a normal distribution?

Best Answer

In probability theory, there is nothing called the cumulative density function as you name it. There is a very important concept called the cumulative distribution function (or cumulative probability distribution function) which has the initialism CDF (in contrast to the initialism pdf for the probability density function). The definition of the CDF $F_X(u)$ of a random variable $X$ is that the value of this function at the argument $u$ (here $u$ can be any real number) is the probability of the event $(X \leq u)$, the probability that the random variable $X$ is no larger than the real number $u$. Using symbols instead of words, we have that $$F_X(u) = P(X \leq u), -\infty < u < \infty.\tag{1}$$

Every random variable (no matter what kind) has a CDF, but pdfs generally are defined only for random variables that are called continuous random variables. So what are continuous random variables? Well, these are random variables that can take on every possible real number value in a continuum which for our purposes can be taken to be the entire real line or an interval $(a,b)$ or $(a,b]$ etc of the real line and whose CDF $F_X(u)$ is a continuous function of $u$ for all values of $u$, and furthermore is also differentiable at every $u$ except possibly for a finite number of points. Keep in mind that the CDF is continuous even at these oddball points, it is just that the CDF is not differentiable at those points. In fact, the reason why the CDF of a continuous random variable is a continuous but non-differentiable function at a point $u_1$ is that the "derivative on the right" does not equal the "derivative on the left".

What's the point of having a differentiable function if one doesn't differentiate it? The pdf $f_X(u)$ of a continuous random variable $X$ is defined to be the derivative of the CDF $F_X(u)$ at every point at which the CDF is differentiable, and at those few points were the derivative does not exist, one can define the value of the pdf to be any number one likes; the choice won't matter in the least. But it is prudent to choose a nonnegative number so that one can fearlessly claim that $f_X(u) \geq 0$ for all real numbers $u$.

And I heard that if we have a cumulative density function for a set of continuous values, we can get a probability for a specific value, but we cannot get a probability for a specific value with a probability density function but we can get a probability between intervals with a probability density function if we do not know the cumulative density function.

Once again, you seem to have some badly mangled notions. Let's start with the probability that a random variable $X$ takes on values in the interval $(a,b]$. Now, the event $(X\leq b)$ is the disjoint union of the events $(X\leq a)$ and $(a < X \leq b)$ and so we have that \begin{align} P(X \leq b) &= P(X \leq a) + P(a < X \leq b)\\ F_X(b) &= F_X(a) + P(a < X \leq b) & {\scriptstyle{\text{on using }} (1)} \end{align} and so we can conclude that $$P(a < X \leq b) = F_X(b) - F_X(a). \tag{2}$$ Equations $(1)$ and $(2)$ hold for all random variables. But consider $(2)$ for the case of a continuous random variable and let's ask what happens in the limit as $a$ approaches $b$ from below. Well, as $a$ gets closer and closer to $b$, every real number that is strictly smaller than $b$ gets eliminated as $a$ moves past it as it approaches $b$. But $b$ itself never gets eliminated. We conclude that the limit of the event is the set containing the single number $b$ all by itself, and so $$P(X=b) = F_X(b) - \lim_{a\uparrow b}F_X(a) = F_X(b) - F_X(b) = 0.$$ Remember that $F_X(u)$ is a continuous function which makes that limit evaluation work the way that is stated. We conclude that

The probability that a continuous random variable $X$ equals $b$ is $0$, and this conclusion applies for every real number $b$.

So, where did all the probability disappear to? Well, for a continuous random variable, probability resides in the intervals of the real line. As a specific case, consider a continuous random variable with CDF $$F_X(u) = \begin{cases}0, & u < 0,\\u, & 0 \leq u < 1,\\1, & u \geq 1.\end{cases}$$ Notice that $F_X(u)$ is continuous at $u=0$ and $u=1$ but is not differentiable at those points. Thus, we readily get that $f_X(u)$ has value $1$ for $0 < u < 1$ and value $0$ for $u < 0$ and $u > 1$. Since the derivative of $F_X(u)$ is undefined at $u=0$ and $u=1$, we set $f_X(0) = f_X(1) = 1$. Now, for $0 < a < b < 1$, we have that $$P(a < X \leq b) = F_X(b)-F_X(a) = b-a,$$ or in words,

the probability that $X$ lies in the interval $(a,b]$ is equal to the length of the interval.

This observation gives some intuition for the assertion that $P(X=b)$ has value $0$: the single point $b$ constitutes an interval of zero length and hence has probability $0$. More generally, the Fundamental Theorem of Calculus tells us that $$P(a < X \leq b) = F_X(b)-F_X(a) = \int_a^b f_X(u) \,\mathrm du$$ which gives us $b-a$ for the pdf introduced as an example but requires more strenuous evaluation of integrals in the general case.

Finally, I want to make the point that asking for the probability that a continuous random variable has value exactly $b$ is a pretty meaningless question in that in practice one can never tell whether the event $(X=b)$ in question has occurred or not. How can we determine whether the observed value of $X$ equals $\pi$ or not? All we have as the observed value is $X$ to some degree of precision, and that number is never going to be equal to $\pi$. Even if one imagines that $X$ is given with infinite precision, it will require an infinite amount of time to compare the infinitely many digits of $X$ with $3.14159126\ldots$ and we cannot stop and say that the digits match thus far and so we will assume that the match extends all the way! What is of interest is the probability that $X$ is approximately $b$ --- the probability that $X$ lies in a short interval containing $b$ (possibly centered on $b$ --- or that $X$ is no larger than $b$, or that $X$ is larger than $b$ and all these probabilities can be readily computed from the CDF, or with a little effort by integrating the pdf over the intervals of interest.

Best Answer

Related Solutions

Solved – Probability density function between -1 and 1

Solved – The relationship between cumulative distribution vs cumulative density vs probability density

Related Question