Find the probability density function for $Y = X^2 + 1$

density functionprobabilityprobability distributions

The question is, let $X$ be a random variable, on $[0, 1]$, with probability density function $p(x) = 2-2x$.

Let $Y$ be a random variable on $[1, 2]$, such that $Y = X^2 + 1$. Find the pdf for $Y$.

I was wondering how the statement "$Y$ on $[1, 2]$" has effects on the solution. The general approach would be constructing corresponding CDF and then differentiate it. I wasn't too sure if the statement has any effects on the solution.

Thank you in advance!

Best Answer

Easy use the fundamental transformation theorem

$$f_Y(y)=f_X(g^{-1}(y))\Bigg|\frac{d}{dy}g^{-1}(y)\Bigg|$$

finding

$$f_Y(y)=\Bigg[\frac{1}{\sqrt{y-1}}-1\Bigg]\mathbb{1}_{(1;2]}(y)$$

this avoid you to calculate the CDF first, as not requested

Observe that: the condition $y \in [1;2]$ is not a restriction but only an useless information; it is the support of Y that you can calculate by your own. In fact trasforming $X \rightarrow Y$ with the transformation $Y=X^2+1$ the original support $X \in [0;1]$ becomes $Y \in [1;2]$.

After calculating $f_Y$ you can see that the support of Y must be $(1;2]$. This doesn't change anything because $P(Y=1)=0$

It is not forbidden to use the method you wanted to use....but it is longer

First you have to derive $F_X(x)=2x-x^2$
Second use the definition:

$$F_Y(y)=P(X^2+1 \leq y]=P[X\leq \sqrt{y-1}]=F_X(\sqrt{y-1})=1-y-2\sqrt{y-1}$$

Derivating $F_Y(y)$ you get the same result as above.

Related Solutions

Probability – How Can a Probability Density Function (pdf) Be Greater Than 1?

Discrete and continuous random variables are not defined the same way. Human mind is used to have discrete random variables (example: for a fair coin, -1 if it the coin shows tail, +1 if it's head, we have that $f(-1)=f(1)=\frac12$ and $f(x)=0$ elsewhere). As long as the probabilities of the results of a discrete random variable sums up to 1, it's ok, so they have to be at most 1.

For a continuous random variable, the necessary condition is that $\int_{\mathbb{R}} f(x)dx=1$. Since an integral behaves differently than a sum, it's possible that $f(x)>1$ on a small interval (but the length of this interval shall not exceed 1).

The definition of $\mathbb{P}(X=x)$is not $\mathbb{P}(X=x)=f(x)$ but more $\mathbb{P}(X=x)=\mathbb{P}(X\leq x)-\mathbb{P}(X<x)=F(x)-F(x^-)$. In a discrete random variable, $F(x^-)\not = F(x)$ so $\mathbb{P}(X=x)>0$. However, in the case of a continuous random variable, $F(x^-)=F(x)$ (by the definition of continuity) so $\mathbb{P}(X=x)=0$. This can be seen as the probability of choosing $\frac12$ while choosing a number between 0 and 1 is zero.

In summary, for continuous random variables $\mathbb{P}(X=x)\not= f(x)$.

[Math] How to find the probability density function of a set of random variables

There is no way to be sure what distribution gives rise to your data. First, there is no assurance that your data fit any 'named' distribution. Second, even if you guess the correct parametric distribution family, you still have to use the data to estimate the parameters. Here are several approaches that might be useful.

First, you might see whether a member of the beta family of distributions is a reasonable fit to your data. These distributions have support $(0, 1)$ and two shape parameters $\alpha > 0$ and $\beta > 0.$ Roughly speaking, these determine the shape of the density curve near 0 and near 1, respectively. It is possible to estimate $\alpha$ and $\beta$ from data. One way is to pick parameters that match the sample mean and variance (method of moments estimation). See the Wikipedia article on 'beta distribution' for particulars.

Below is a histogram of a sample of 1000 observations simulated according to the distribution $Beta(\alpha = 1, \beta = 10)$ along with the density function (solid blue curve) of that particular distribution. (I used R statistical software and show the R code. R is available free of charge at www.r-project.org for Windows, Mac and Linux.)

x = rbeta(1000, 5, 10)             # generate fake data
mean(x);  sd(x)                    # sample mean and SD
## 0.3320269
## 0.1163734
hist(x, prob=T, col="skyblue")
curve(dbeta(x, 5, 10), lwd=2, col="blue", add=T)   # pop density curve
lines(density(x), lwd=2, lty="dotted", col="red")  # density est from data

It takes a lot of data to get really close estimates of the parameters. For example, here the sample mean is 0.3320 while the population mean is .3333. I will let you check how closely the sample and population variances match.

A second method is to use a density estimator of your data. (See Wikikpedia on 'density estimator' or google 'KDE' for 'kernel density estimator'.) The last line of code puts the dotted red density estimator onto the plot above. The function density(x) produces $(x,y)$ coordinates. These may be of use if a digitized approximation to the density is useful.

By sorting the data, it seems to me that you are making an "empirical distribution function" (ECDF). ECDFs tend to match theoretical CDFs better than density estimators match histograms, partly because information is lost when data are sorted into bins to make a histogram. For a continuous distribution, you could try taking differences of small intervals in an ECDF to approximate the PDF, but I think density estimation is easier to use.

Below is the ECDF of the data generated above (heavy black 'stairstep', increasing by $1/n$ at each sorted datapoint), along with the population CDF (thin blue curve). Black tick marks at the horizontal axis show the locations of individual observations.

plot(ecdf(x), lwd=3)
curve(pbeta(x, 5, 10), col="blue", add=T)
rug(x)

You do not say why you want to know the PDF for your data. By googling some of the terminology I have used here, you may be able to find a solution that matches your goals better than anything in my example.

Best Answer

Related Solutions

Probability – How Can a Probability Density Function (pdf) Be Greater Than 1?

[Math] How to find the probability density function of a set of random variables

Related Question