Distributions – How to Construct a Cumulative Probability Distribution Function from Two Others

distributionsjoint distribution

I dip into project time estimation and can't find intuition. What is the cumulative probability distribution of an event when two independent tasks both complete successfully (when performed in parallel)?

We have two cumulative probability distribution functions on 2 independent random variables that represent duration from project start to possible finish date (more time – more likely that a team completes a task):

$$F_1(x) = P(\xi_1 \le x)$$
$$F_2(x) = P(\xi_2 \le x)$$

What is:

$$F(x) = P(\xi_1 \le x\ \&\ \xi_2 \le x)$$

How can it be derived or represented from/by $F_1$ and $F_2$?

UPDATE I agree that

$$P(\xi_1 \le x\ \&\ \xi_2 \le y) = P(\xi_1 \le x) \cdot P(\xi_2 \le y) $$

for event $(\xi_1, \xi_2) \in R^2$

But misconception come from expected value of joint event: what is average time to complete both task?

I think that it is:

$$E(X) = \int_0^\infty xf_1(x)f_2(x)dx$$

Previously I think that it will be $E(\xi_1)\cdot E(\xi_2)$ – but this is wrong (you get time^2 unit!!) and so doubt about multiplication low as think that something wrong in that part.

Best Answer

The question asks for the expected time to complete both of two independent tasks. Call these times $X_1$ and $X_2$: they are random variables supported on $[0,\infty)$.

Let $F_i$ be the cumulative distribution functions (CDF) of the $X_i$:

$$F_i(x) = \Pr(X_i\le x).$$

The time to complete both tasks is $Y =\max(X_1,X_2)$. Its CDF is given by

$$\eqalign{ F_Y(y) = \Pr(Y\le y) &= \Pr(X_1\le y\text{ and }X_2\le y) \\&= \Pr(X_1\le y)\Pr(X_2\le y) \\&= F_1(y)F_2(y).}$$

All equalities arise from definitions: of the CDF, of $Y$, and of independence.

Assuming both variables $X_1$ and $X_2$ are absolutely continuous, one way to obtain the expectation of $Y$ is to integrate over the joint distribution of $(X_1,X_2)$, as suggested in the question. There are much easier ways--as explained below--but to show it can be done this way, let's split the integral into one integral over $\{(x_1,x_2)\,|\, x_1\ge x_2\}$ and another over $\{(x_1,x_2)\,|\, x_1\lt x_2\}$, because in the first case $\max(x_1,x_2)=x_1$ and in the second case $\max(x_1,x_2)=x_2$:

$$\eqalign{ \mathbb{E}[Y] &= \mathbb{E}[\max(X_1,X_2)] = \iint_{\mathbb{R}^2} \max(x_1,x_2) dF_1(x_1)dF_2(x_2)\\ &= \int_\mathbb{R}\int_{-\infty}^{x_1} x_1dF_2(x_2)dF_1(x_1) + \int_\mathbb{R}\int_{-\infty}^{x_2} x_2dF_1(x_1)dF_2(x_2) \\ &= \int_\mathbb{R}x_1 F_2(x_1)dF_1(x_1)+\int_\mathbb{R}x_2 F_1(x_2)dF_1(x_2) \\ &= \int_\mathbb{R}y\left(F_2(y)f_1(y) + F_1(y)f_2(y)\right)dy } $$

(writing $dF_i(y) = f_i(y)dy$).

However, this formula is easier to obtain by noting that the expectation of $Y$ can equally well be found via a single integral from the product rule

$$d\left(F_1(y)F_2(y)\right) = \left( F_2(y)f_1(y) + F_1(y)f_2(y)\right)dy,$$

whence

$$\mathbb{E}(Y) = \int_\mathbb{R}y \left(d F_y(y)\right) = \int_\mathbb{R} y\left( F_2(y)f_1(y) + F_1(y)f_2(y)\right)dy.$$

An even more general and simpler expression for the expectation obtains by integrating the survival function $1-F_Y$, because these variables have nonnegative support:

$$\mathbb{E}(Y) = \int_0^\infty(1 - F_Y(y))dy = \int_0^\infty(1 - F_1(y)F_2(y))dy.$$

This stands up to units analysis: the units in the integrand are probability (from $1 - F_1(y)F_2(y)$) times units of $Y$ (from the $dy$ term), whence the integral is in the units of $Y$.

Related Question