[Math] Conditioning on one term of a sum of random variables

Let $\theta$ be normally distributed with mean $\bar \theta$ and variance $s^2$. Let $Z$ be normally distributed with mean $0$ and variance $\sigma^2$, and chosen independently of $\theta$. Define $X = \theta + Z$. Clearly, $X$ has mean $\bar\theta$ and variance $s^2 + \sigma^2$.

Write $\zeta^2 = (\tfrac{1}{s^2} + \tfrac{1}{\sigma^2})^{-1}$. It is well-known that for normal random variables, $$\mathbb E(\theta|X) = \tfrac{\zeta^2}{\sigma^2} X + \tfrac{\zeta^2}{s^2} \bar\theta \qquad \mathrm{and} \qquad \operatorname{Var}(\theta|X) = \zeta^2 + \sigma^2.$$

Note that the conditional mean is linear in the variable $X$, and the conditional variance does not depend on the conditioned value of $X$, simply that some conditioning occurred. These facts hold in wide generality.

Question: Let $\theta$ and $Z$ be independent random variables. What are the most general conditions on the distributions of $\theta$ and $Z$ so that the above conditional formulas hold?

While these formulas will hold for specific classes of distributions (e.g., Beta), Robert Israel points out below that these formulas are unlikely to hold generally. I would also be satisfied with some sort of "approximate" version of the formulas. Is there a nice way to quantify a distribution based on how poorly these formulas hold?

Here's some motivation for this question. The random variable $\theta$ represents the "true" preference of society for party A over party B. Because of opinion polling, this is known to be within a few standard deviations $s$ of the value $\bar\theta$. On the other hand, each voter receives a noisy signal $X$, which provides some information about the true value $\theta$. The conditional formulas above quantify how much information she obtains from her signal.

For this application, I'm comfortable with $\theta$ being normally distributed, since the Central Limit Theorem is valid for opinion polls. However, I see no justification for the noise $Z$ being normally distributed instead of power law, for example.

For other applications, there's no reason to suppose that the true parameter $\theta$ is known with Gaussian certainty– it too may be power law.

Best Answer

These equations are unlikely to be true for more general distributions, except in rather special circumstances. Certainly finiteness conditions on moments will not be enough. If $\Theta$ and $Z$ have densities $f_\Theta$ and $f_Z$, the general formula for the conditional expectation is $$ E[\Theta | X=x ] = \frac{\int_{\mathbb R} \theta f_\Theta(\theta) f_Z(x-\theta)\ d\theta}{\int_{\mathbb R} f_\Theta(\theta) f_Z(x-\theta)\ d\theta}$$ which in general won't have any particularly nice functional form.

Best Answer

Related Solutions

[Math] Autocorrelation of a ±1-valued random process with certain statistics

[Math] Sum of a random number of identically distributed but dependent random variables

References

Related Question