Solved – Operations on probability distributions of continuous random variables

distributionsprobability

How do probability distributions of continuous random variables transform under functions?

I.e. I have a random variable, X, drawn from a normal distribution with mean 0 and variance 1. What is the probability distribution associated with sin(X)?
Histograms mimicking probability density functions of X and sin(X)

More generally, what are the rules for transforming continuous random variables? If we know the PDF and CDF of two random variables, X,Y what is the PDF/CDF of Z=X*Y ? How about Z=X^Y ? How about Z = sin(X+Y)+3?

Are there Computer Algebra Systems which can compute this symbolically? Is this possible generally? If not, for what class of probability distributions and functions is it possible?

Note: excuse the plot. These are obviously noisy histograms and obviously not to scale (areas under the curves do not match and so could not both sum to one). Hopefully the plot does get the point across though. Given the blue distribution describing X, I want the red distribution describing sin(X).

Best Answer

If you have a random variable $X$ distributed with continuous distribution function $F$, and you define a random variable $Y=h(X)$, then what is its distribution function? Let's just use the definition of distribution function: \begin{align} G(y) &= P\{Y \le y\} \\ &= P\{h(X) \le y\} \end{align} If $h$ is monotonically increasing (hence invertible) and differentiable, then the next steps are easy: \begin{align} G(y) &= P\{X \le h^{-1}(y)\} \\ &= F(h^{-1}(y))\\ g(y) &= \frac{d}{dy}G(y) = f(h^{-1}(y))\frac{d}{dy}h^{-1}(y) \end{align} By considering the decreasing case, you can see that the general formula for monotonic $h$ is: \begin{align} g(y) &= f(h^{-1}(y))|\frac{d}{dy}h^{-1}(y)| \end{align} You are interested in cases where $h$ is not invertible, though, and in cases where the function $h$ takes many arguments and returns a single value but where the random variables are continuous. So, consider a bunch of random variables $X_1,\ldots,X_K$ with continuous joint distribution function $F(X_1,\ldots,X_K)$ and a random variable $Y$ defined by a differentiable function $h$ as $Y=h(X_1,\ldots,X_K)$ \begin{align} G(y) &= P\{Y \le y\} \\ &= P\{h(X_1,\ldots,X_K) \le y\}\\ &= \int_{h(X_1,\ldots,X_K) \le y} f(X_1,\ldots,X_K) d X_1 d X_2 \ldots dX_K \end{align} The random variable $Y$ has density: \begin{align} g(y) &= \frac{d}{dy} \int_{h(X_1,\ldots,X_K) \le y} f(X_1,\ldots,X_K) d X_1 d X_2 \ldots dX_K \end{align} This is not that useful in practice, though. Generally, you are going to have to find a way, on a function-by-function case, to make evaluating these two items tractable. In the case of $Y=sin(X)$, $sin$ is periodic, so you just chop up its domain into half-cycles (within which it is monotonic and invertible). You can get the density (except at points where $Y=0$ or $Y=1$) from the infinite series (which, as a practical matter you approximate by just leaving off the terms where $f(x)$ is very small): \begin{align} g(y) &= \sum_{x:sin(x)=y} f(x) \left| \frac{d}{dy} sin^{-1}(y) \right| \end{align} For your example of $Y=X_1X_2$: \begin{align} G(y) &= P\{Y \le y\} \\ &= \int_{X_1X_2 \le y} f(X_1,X_2) d X_1 d X_2 \end{align} Because of the way sign and multiplication work, evaluating this integral is a bit annoying. Let's evaluate it for a $y\ge0$. For a $y$ like this, $X_1X_2\le y$ any time one but not both $X$s are negative, any time both are positive but not too big, and any time both are negative but not too big in absolute value: \begin{align} G(y) &= \int_0^\infty \int_{-\infty}^0 f(X_1,X_2) d X_1 d X_2 + \int_{-\infty}^0 \int_0^\infty f(X_1,X_2) d X_1 d X_2\\ &+ \int_0^\infty \int_0^{\frac{y}{X_1}} f(X_1,X_2) d X_1 d X_2 + \int_{-\infty}^0 \int_{\frac{y}{X_1}}^0 f(X_1,X_2) d X_1 d X_2 \end{align} Then the density of $Y$ is going to be: \begin{align} g(y) &= \frac{d}{dy}G(y)\\ &= \int_0^\infty \frac{1}{X_1}f(X_1,\frac{y}{X_1}) d X_1 + \int_{-\infty}^0 -\frac{1}{X_1} f(X_1,\frac{y}{X_1}) d X_1 \end{align}

Related Question