Continuous Random Variable Transformations vs Discrete

change-of-variablecumulative-distribution-functionsdensity functionprobabilityprobability distributions

My Textbook, Introduction to Mathematical Statistics, has the following example of finding the pdf of a transformation of a continuous random variable:

Let $X$ be a random variable with pdf $f_X(x)=2x$ for $0 < x < 1$, zero elsewhere, and cdf $F_X(x)=x^2$. Let $Y = X^2$ be a second random variable. Find $f_Y(y)$, the pdf of $Y$.

Solution:

$F_Y(y)=P(Y\leq y)=P(X^2\leq y)=P(X\leq \sqrt{y})=F_X(\sqrt y) = \sqrt{y}^2 = y.$

$f_Y(y) = \frac{dF_Y(y)}{dy} = 1.$

I can follow the solution, but my first approach to this problem would have been the one described to solve the same problem with discrete random variables – to simply use the inverse of the transformation as substitution into $f_X(x)$, since the transformation is one-to-one:

$f_Y(y) = f_X(g^{-1}(y))=2\sqrt{y}$

I see that this is clearly wrong since the cumulative probability of this pdf over the interval is not equal to 1, but I'd like to understand why this process works for discrete random variables to find the pmf of a transformation, but doesn't work for continuous random variables to find the pdf of a transformation. Why do we need to make the substitution in the cumulative distribution function if the random variable is continous?

Best Answer

Because the pdf is an unsigned derivative, we must apply the chain rule for derivation.

$$\begin{align}f_Y(y) &=\begin{vmatrix}\dfrac{\mathrm d F_Y(y)}{\mathrm d y}\end{vmatrix}\\[1ex] &=\begin{vmatrix}\dfrac{\mathrm d F_X(g^{-1}(y))}{\mathrm d y}\end{vmatrix}\\[1ex] &=\begin{vmatrix}\dfrac{\mathrm d F_X(g^{-1}(y))}{\mathrm d g^{-1}(y)}\cdot\dfrac{\mathrm d g^{-1}(y)}{\mathrm d y}\end{vmatrix}\\[1ex] &= f_X(g^{-1}(y))\cdot\begin{vmatrix}\dfrac{\mathrm d g^{-1}(y)}{\mathrm d y}\end{vmatrix}\\[4ex]f_Y(y) &=2 g^{-1}(y)\cdot\begin{vmatrix}\dfrac{\mathrm d g^{-1}(y)}{\mathrm d y}\end{vmatrix}\\[1ex]&= 2\sqrt y~\mathbf 1_{0<\sqrt y<1}\cdot\begin{vmatrix}\dfrac{\mathrm d \sqrt y}{\mathrm d y}\end{vmatrix}\\[1ex]&=2\sqrt y~\mathbf 1_{0<y<1^2}\cdot\dfrac{1}{2\sqrt y}\\[1ex]&=\mathbf 1_{0<y<1}\end{align}$$


(† a pdf is required to map to non-negative reals values, so we use absolute value functions to ensure the transformation of variables retains this property.)


I'd like to understand why this process works for discrete random variables to find the pmf of a transformation, but doesn't work for continuous random variables to find the pdf of a transformation.

Because the support for a discrete distribution consists of a set of discrete points each with a probability mass.  A transformation which maps the points one-to-one to another set of discrete points won't affect the probability mass measure no matter if the points are spread further apart or pushed close together (unless they are folded onto one another).

However, the support for a continuous distribution consists of a continuous interval whose points have probability density.  So a transformation which maps that interval one-to-one may involve stretching or squeezing, and hence affect the probability density of the new interval.

The chain rule is how we account for this.

Related Question