[Math] the relationship betweeen a pdf and cdf

statistics

I am learning stats. On page 20, my book, All of Statistics 1e, defines a CDF as function that maps x to the probability that a random variable, X, is less than x.

$F_{x}(x) = P(X\leq x)$

On page 23 it gives a function

$P(a < X < b ) = \int_{a}^{b}f_{X}dx$

and then says that "the function $f_{X}$ is called the probability density function. We have that…"

$F_{x}(x) = \int_{-\infty}^{x}f_{X}dt$

I am a little confused about how to characterize the most important difference between them. The equation above says that the cdf is the integral of the pdf from negative infinity to x. Is it fair to say that the cdf is the integral of the pdf from negative infinity to x?

Best Answer

Yes. That's correct. A PDF is a probability density function. It is stating the probability of a particular value coming out. Taking this analogy to a discrete distribution, the PDF of a 6-sided die is: $[x<1:0,x=1:\frac{1}{6},x=2:\frac{1}{6},x=3:\frac{1}{6},x=4:\frac{1}{6},x=5:\frac{1}{6},x=6:\frac{1}{6},x>6:0]$. For a continuous probability distribution, you can't really use the PDF directly, since the probability of an infinitesimally thin slice of the PDF being selected is intuitively zero.

That's where the cumulative density function, or CDF, comes it. It is a measure of how likely the value is to be less than some arbitrary value (which we pick). For a discrete case, you start with the first possible value, and add all the entries in the PDF up to the value of interest: $$CDF=\sum PDF \rightarrow [x<1:0,x<2:\frac{1}{6},x<3:\frac{2}{6},x<4:\frac{3}{6},x<5:\frac{4}{6},x<6:\frac{5}{6},x\geq 6:\frac{6}{6}]$$ Notice how the final value of the CDF is $1$. This is expected, since every possible outcome of rolling a 6-sided die is less than or equal to 6.

Now let's go back to the continuous probability distribution. In this case, we don't have a finite set of options for the answer to be, so we can't constrain $X$. Thus, we start from $-\infty$, since that encompasses everything to the left of the chosen $x$. As you should be aware from calculus, the integral is to continuous functions what a sum is to discrete functions - loosely. The value of a CDF is that you can use it to determine the probability of the number falling within a specific range as follows:

$$F(a\leq X \leq b) = F(X \leq b) - F(X \leq a) = \int_{-\infty}^{b} f(x)dx - \int_{-\infty}^{a} f(x)dx = \int_{a}^{b} f(x)dx$$