Normal Distribution – Intuitive Explanation of Contribution to Sum of Two Random Variables

conditional probabilitynormal distribution

If I have two normally distributed independent random variables $X$ and $Y$ with means $\mu_X$ and $\mu_Y$ and standard deviations $\sigma_X$ and $\sigma_Y$ and I discover that $X+Y=c$, then (assuming I have not made any errors) the conditional distribution of $X$ and $Y$ given $c$ are also normally distributed with means
$$\mu_{X|c} = \mu_X + (c – \mu_X – \mu_Y)\frac{ \sigma_X^2}{\sigma_X^2+\sigma_Y^2}$$ $$\mu_{Y|c} = \mu_Y + (c – \mu_X – \mu_Y)\frac{ \sigma_Y^2}{\sigma_X^2+\sigma_Y^2}$$
and standard deviation
$$\sigma_{X|c} = \sigma_{Y|c} = \sqrt{ \frac{\sigma_X^2 \sigma_Y^2}{\sigma_X^2 + \sigma_Y^2}}.$$

It is no surprise that the conditional standard deviations are the same as, given $c$, if one goes up the other must come down by the same amount. It is interesting that the conditional standard deviation does not depend on $c$.

What I cannot get my head round are the conditional means, where they take a share of the excess $(c – \mu_X – \mu_Y)$ proportional to the original variances, not to the original standard deviations.

For example, if they have zero means, $\mu_X=\mu_Y=0$, and standard deviations $\sigma_X =3$ and $\sigma_Y=1$ then conditioned on $c=4$ we would have $E[X|c=4]=3.6$ and $E[Y|c=4]=0.4$, i.e. in the ratio $9:1$ even though I would have intuitively thought that the ratio $3:1$ would be more natural. Can anyone give an intuitive explanation for this?

This was provoked by a Math.SE question

Best Answer

The question readily reduces to the case $\mu_X = \mu_Y = 0$ by looking at $X-\mu_X$ and $Y-\mu_Y$.

Clearly the conditional distributions are Normal. Thus, the mean, median, and mode of each are coincident. The modes will occur at the coordinates of a local maximum of the bivariate PDF of $X$ and $Y$ constrained to the curve $g(x,y) = x+y = c$. This implies the contour of the bivariate PDF at this location and the constraint curve have parallel tangents. (This is the theory of Lagrange multipliers.) Because the equation of any contour is of the form $f(x,y) = x^2/(2\sigma_X^2) + y^2/(2\sigma_Y^2) = \rho$ for some constant $\rho$ (that is, all contours are ellipses), their gradients must be parallel, whence there exists $\lambda$ such that

$$\left(\frac{x}{\sigma_X^2}, \frac{y}{\sigma_Y^2}\right) = \nabla f(x,y) = \lambda \nabla g(x,y) = \lambda(1,1).$$

enter image description here

It follows immediately that the modes of the conditional distributions (and therefore also the means) are determined by the ratio of the variances, not of the SDs.

This analysis works for correlated $X$ and $Y$ as well and it applies to any linear constraints, not just the sum.

Related Question