[Math] Joint densities and conditional densities of sums of i.i.d. normally distributed random variables

conditional-expectationnormal distributionprobability distributionsprobability theory

Let $X_1,X_2,…$ be independent with the common normal density $\eta$, and $S_k= X_1+⋯+X_k$. If $m <n$ find the joint density of $(S_m,S_n)$ and the conditional density for $S_m$ given that $S_n=t$.

Solution attempt

Let the independent random variables have $X_1,X_2,…$ with the common normal density $\eta$ have parameters $(\mu,\sigma^2)$. Then both $S_m$ and $S_n$ have normal densities with parameters$(m\mu,m\sigma^2)$ and $(n\mu,n\sigma^2)$ respectively and there densities are $f_{S_m}$ and $f_{S_n}$.

We also observe that $S_m$ and $S_{n-m}$ are two independent random variables with density $f_{S_m}$and $f_{S_{n-m}}$ and the density of their sum $S_n$ is $f_{S_n}$. The pairs $(S_m,S_n )$ and $(S_m,S_{n-m})$ are related by linear transformation $S_m= S_m,S_{n-m}=S_n-S_m$ with determinant 1. So the joint density of $(S_m,S_n)$ is given by $f_{S_m}(x) f_{S_{n-m}}(s-x)$

$f_{S_m} (x) f_{S_{n-m}} (s-x)=\dfrac{1}{\sqrt {2\pi m \sigma ^{2}}}e^{-\dfrac {\left( x-m \mu\right) ^{2}} {2m \sigma ^{2} }}\dfrac{1}{\sqrt {2\pi (n-m) \sigma ^{2}}}e^{-\dfrac {\left( s-x-(n-m)\mu\right) ^{2}} {2(n-m) \sigma ^{2} }}$

$f_{S_m} (x) f_{S_{n-m}} (s-x)=\dfrac{1}{2\pi\sigma ^{2}\sqrt { m(n-m) }}e^{\dfrac {-(n-m)\left( x-m \mu\right) ^{2}-m\left( s-x-(n-m)\mu\right) ^{2}} {2m(n-m) \sigma ^{2} }}$

I can 't seem to be able to put this into the standard bi-variate normal distribution form.

This appears like a bivariate normal density which tallies with the answer provided by the author but he also states a bivariate normal density with variances $m, n$ and covariance $\sqrt {\dfrac {m} {n}}$.

Best Answer

This seems to be a case where going back to full-fledged formulas right from the start may not help the comprehension, so let us try to keep formulas at a distance as long as possible and to understand what is going on (although formulas will be necessary in the end, of course...).

Assume that $\eta$ has mean $\mu$ and variance $\sigma^2$. Then the joint distribution of $(S_m,S_n)$ is bivariate normal hence it is characterized by the mean vector $(\mathbb ES_m,\mathbb ES_n)=(m\mu,n\mu)$ and the variance-covariance matrix $C$ with entries $\mathrm{var}(S_m)=m\sigma^2$, $\mathrm{var}(S_n)=n\sigma^2$, and $\mathrm{cov}(S_m,S_n)=m\sigma^2$, that is, $$C=\sigma^2\begin{pmatrix}m & m \\ m & n\end{pmatrix}.$$ The inverse of $C$ is the matrix $$C^{-1}=\dfrac1{\tau^2}\begin{pmatrix}n & -m \\ -m & m\end{pmatrix},\qquad\tau^2=m(n-m)\sigma^2, $$ hence the joint density at $(x,y)$ of the centered vector $(S_m-m\mu,S_n-n\mu)$ is proportional to $$ \exp\left(-\frac12(x,y)C^{-1}(x,y)^T\right)=\exp\left(-\frac1{2\tau^2}(nx^2-2mxy+my^2)\right). $$ Conditioning by $[S_n=t]$ is equivalent to fixing $y=t-n\mu$ in this formula and considering the resulting one-variable function of $x$ as a multiple of a density. Hence, the density of $S_m-m\mu$ conditionally on $[S_n=t]$ is proportional to $$ \exp\left(-\frac1{2\tau^2}(nx^2-2mxy)\right)\propto\exp\left(-\frac{n}{2\tau^2}\left(x-\frac{my}n\right)^2\right). $$ Thus, the distribution of $S_m$ conditionally on $[S_n=t]$ is normal with mean $\mu_t$ and variance $\sigma_t^2$, with $$ \mu_t=(my/n)+m\mu=mt/n,\qquad\sigma^2_t=\tau^2/n=(n-m)(m/n)\sigma^2. $$ One may find reassuring that $\mu_t=t$ and $\sigma^2_t=0$ when $n=m$ (why?) and that $\mu_t=0$ and $\sigma^2_t=0$ when $m=0$ (why?).

Nota: The fact that the distribution of $S_m$ conditionally on $[S_n=t]$ has mean $\mu_t=mt/n$ and variance $\sigma_t^2=(n-m)(m/n)\sigma^2$, is valid for every sequence $(S_k)$ of sums of i.i.d. random variables $(X_k)$ with variance $\sigma^2$, irrespectively of the common distribution of the random variables $X_k$. The fact that this conditional distribution is normal depends very much of the common distribution of the random variables $X_k$ being normal, naturally.

Edit: To prove that $(S_m,S_n)$ is bivariate normal, the canonical way is to start from the fact that $X=(X_k)_{1\leqslant k\leqslant n}$ is a gaussian vector with i.i.d. coordinates with mean $\mu$ and variance $\sigma^2$. Furthermore, there exists a matrix $A$ of size $2\times n$ such that the vector $U=(S_m,S_n)$ is $U=A\cdot X$. Since $U$ is an affine transformation of $X$, $U$ is a gaussian vector with mean $A\cdot \mathbb E(X)$ and variance-covariance $$C=A\cdot \mathrm{Cov}(X)\cdot A^T=\sigma^2 AA^T. $$ Finally, writing $L_k$ for a line of $k$ ones and $Z_k$ for a line of $k$ zeroes, the definition of $S_m$ and $S_n$ yields $$A=\begin{pmatrix}L_m & Z_{n-m}\\ L_m & L_{n-m}\end{pmatrix},$$ hence, as indicated above, $$A\cdot \mathbb E(X)=\begin{pmatrix}m\mu\\ n\mu\end{pmatrix},\qquad AA^T=\begin{pmatrix}m & m\\ m & n\end{pmatrix}.$$

Related Question