Solved – Integral of a conditional uniform distribution leads to improper integral

integraluniform distribution

I have two uniforms distributions, $X_1 \sim\it{U}(a,b)$ and $X_2\sim\it{U}(X_1+\delta,b+\delta)$. I would like to compute $P(X_2\in[a+\delta,b+\delta])$. So I do this:

$$\begin{eqnarray*}
P(X_2\in[a+\delta,b+\delta]) & = & \int_{a}^{b} P(X_2\in[a+\delta,b+\delta]|X_1=y)\cdot P(X_1=y) dy\\
& = & \frac{1}{b-a} \int_{a}^{b} \frac{1}{b-y} dy\\
& = & \frac{1}{b-a} \ln\Bigg(\frac{b-a}{b-b}\Bigg)
\end{eqnarray*}$$

The wrong result is related to the fact that a uniform distribution has probability $\frac{1}{x}$, but it doesn't work for $x<1$ (in this case, the fraction becomes bigger than 1…).

How do we solve this problem formally, in a clean way?

Best Answer

You can see that the answer is $1$ without any calculation. By definition, you have $a \le X_1 \le b$ and $X_1+\delta \le X_2 \le b+\delta$, so: $$\begin{align} X_2 &\ge X_1 + \delta \ge a + \delta \quad \text{ and }\\ X_2 &\le b + \delta \end{align}$$ So $\Pr(X_2 \in [a+\delta, b+\delta]) = 1$.


A uniform distribution over an interval of length $l > 0$ has density $1/l$ at every point. When you integrate this $1/l$ over the interval you get $1$ as you should; whether $l < 1$ or $l > 1$ is irrelevant. Also, the actual probability of taking any particular exact value is $0$, not $1/l$ as you seem to think. (Your $P(X_1=y)$ is $0$ for any $y$.) For continuous variables, we need to work with the probability density function, not the probability mass function.

So with all that in mind, the correct calculation would be as follows. The density $f_1(x)$ of $X_1$ is $\frac{1}{b-a}$ for $x \in (a,b)$, and $0$ outside. The density $f_2(x)$ of $X_2$, for a given value of $X_1$ (so let's write it as $f_{2,X_1}(x)$ actually), is $\frac{1}{b-X_1}$ for $x \in (X_1+\delta, b+\delta)$, and $0$ outside. To get this density function as a value of $x$ alone, without depending on the value of $X_1$, you need to integrate over all values $y$ of $X_1$: that is, $$f_2(x) = \int_{y} f_{2,y}(x)f_1(y) dy.$$

Now note that for the second factor $f_1(y)$ to be nonzero, you need $a \le y \le b$ as we said above. For the first factor $f_{2,y}(x)$ to be nonzero, you need $y+\delta \le x \le b+\delta$, so you also need $y \le x - \delta$. As you can check that $a+\delta \le x \le b+\delta$, you have $x - \delta \le b$, so the true bounds on $y$ are $a \le y \le \min(b, x-\delta)$, i.e., $a \le y \le x-\delta$. So

$$\begin{align}f_2(x) &= \int_{a}^{x-\delta} f_{2,y}(x)f_1(y) dy \\ &= \int_{a}^{x-\delta} \frac{1}{b-y} \frac{1}{b-a} dy \\ &= \frac{1}{b-a} \ln\frac{b-a}{b-(x-\delta)} \end{align}$$ for $x \in (a+\delta, b+\delta)$, and $0$ outside.

The fact that this $f_2(x)$ varies with $x$ shows that $X_2$ is not uniformly distributed. However, when you integrate over the entire region $(a+\delta, b+\delta)$, you get $1$ as you should, so it is a valid distribution: $P(X_2 \in (a+\delta, b+\delta))$ is

$$\begin{align} \int_{a+\delta}^{b+\delta} f_2(x) dx &= \int_{a+\delta}^{b+\delta} \frac{1}{b-a} \ln\frac{b-a}{b-(x-\delta)} dx \\ &= \int_{a}^{b} \frac{1}{b-a} \ln\frac{b-a}{b-u} du \quad \text{ substituting } u=x-\delta\\ &= \ln(b-a) - \frac{1}{b-a} \int_{a}^{b}\ln(b-u) \ du \\ &= \ln(b-a) - \frac{1}{b-a} \int_{0}^{b-a} \ln t \ dt \quad \text{ substituting } t=b-u \\ &= \ln(b-a) - \frac{1}{b-a} ((b-a)\ln(b-a) - (b-a)) \\ &= 1 \end{align}$$

Related Question