Solved – Variance of the linear transformation of a random variable

data transformationrandom variablevariance

I have a problem where the variance I'm calculating does not seem right. I have the following data:

i  =  1   2   3   4   5
pi = 0.1 0.1 0.2 0.2 0.4
xi =  1   2   3   4   5

I am trying to transform the random variable to a range of 1-100, which would be the following:

Y = aX+b, where a = 24.75, and b = -23.75

For these data, I'm getting:

$$E(X) = 3.7$$
$$E(Y) = 24.75*3.7-23.75 = 67.825$$

$$Var(X) = E(X^2)-E(X)^2 = 1.81$$

These all seem to be OK; however, when calculating Var(Y) I'm getting an value of 1108.74, and that just doesn't seem right given that it is way beyond the new range. The formula I'm using is:

$$Var(Y) = a^2*Var(X)$$

I've also taken the sum of each transformed X value multiplied by its probability to get $E(Y)$ and then similarly found $E(Y^2)$ to calculate Var(Y) similar to the calculation for Var(X) above, and get the same large value.

I'd expect the variance to be within the range of Y, but this is not happening. Is there something I'm overlooking here?

Best Answer

You're not wrong, here's a snipet of R that validates your results:

> x <- c(1, 2, 3, 3, 4, 4, 5, 5, 5, 5)
> y <- 24.75 * x - 23.75
> mean(x)
[1] 3.7
> mean(y)
[1] 67.825
> mean(x*x) - mean(x)**2
[1] 1.81
> mean(y*y) - mean(y)**2
[1] 1108.738

The essential point that you are missing is that the variance of a random variable has different units than the random variable itself, so you should not expect them to have the similar magnitudes. For example, human height has length like units - say we measure in meters - the variance of human height then has units meters^2. The standard deviation is defined in the way it is to recalibrate the measure of spread back into the same units as the random variable itself. This is why you never, for example, see variance bands on a histogram, it's always standard deviation/error bands, as these have the appropriate units of measurement.

Related Question