[Math] Covariance matrix of multivariate Gaussian

expectationnormal distributionprobabilityprobability distributionsprobability theory

I want to calculate the Covariance matrix of an n-dimensional normal distribution given by $Y=AX+a$ where $X=(X_1,…,X_n)$ with each $X_i$ a standard normal distribution.

I have calculated the density of $Y$ as $$f(y)=\frac{1}{(2\pi)^{\frac{n}{2}}|det(A)|}e^{-\frac{1}{2}(y-a)^{T}(AA^{T})^{-1}(y-a)}$$ which according to my notes is correct. Wikipedia has as PDF $$f(y)=\frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{-1/2}}e^{-\frac{1}{2}(y-a)^{T}\Sigma^{-1}(y-a)}$$

with covariance matrix $\Sigma$, from which I infer that I should have $\Sigma=AA^{T}$, i.e. my covariance matrix should be given by $AA^{T}$.

But doing the actual calculation I get as Covariance of the components $Y_k,Y_l$, with expectations $a_k, a_l$ respectively: $$Cov(Y_k,Y_l)=\mathbb{E}[(Y_k-a_k)(Y_l-a_l)]=\mathbb{E}[Y_kY_l-a_kY_l-a_lY_k+a_ka_l]=\mathbb{E}[Y_kY_l]-a_ka_l=\mathbb{E}[(AX+a)_k(AX+a)_l]-a_ka_l=\mathbb{E}[(X_1\sum_{i=1}^na_{ki}+a_k])(X_1\sum_{i=1}^na_{li}+a_l)]-a_ka_l=\mathbb{E}[X_1^2(\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})+X_1(\sum_{i=1}^na_{ki})+X_1(\sum_{i=1}^na_{li})+a_ka_l]-a_ka_l=\mathbb{E}[X_1^2](\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})=(\sum_{i=1}^na_{ki})(\sum_{i=1}^na_{li})$$
where in the last two steps I have used linearity of expectation and the fact that the components are standard normally distributed, i.e. $\mathbb{E}[X_1]=0$ and $\mathbb{E}[X_1^2]=1$.

However, this isn't equal to $(AA^{T})_{kl}=\sum_{i=1}^{n}a_{ki}a_{li}$.

Does somebody see what I did wrong/what I am missing?

Best Answer

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \mathbb{E} \left[ \left( X_1 \sum_{i=1}^n a_{ki} + a_k \right) \left( X_1 \sum_{i=1}^n a_{li} + a_l \right) \right]$$

does not hold true. Instead it should read

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \mathbb{E} \left[ \left( \sum_{i=1}^n a_{ki} X_i + a_k \right) \left( \sum_{j=1}^n a_{lj} X_j + a_l \right) \right]. \tag{1}$$

Note that this makes a difference since the distribution of the vector $(X_1,X_1)$ does not equal the distribution of $(X_i,X_j)$ (this means that we cannot simply replace $X_i$ and $X_j$ in $(1)$ by $X_1$). Clearly, by $(1)$,

$$\begin{align*} \mathbb{E}[(AX+a)_k (AX+a)_l] &= \sum_{i=1}^n \sum_{j=1}^n a_{ki} a_{lj} \mathbb{E}(X_i X_j) + a_l \mathbb{E} \left( \sum_{i=1}^n a_{ki} X_i \right) \\ &\quad + a_k \mathbb{E} \left( \sum_{j=1}^n a_{lj} X_j \right) + a_k a_l \\ \end{align*}$$

Although it is not mentioned explicltly in your question, I take it that $X_1,\ldots,X_n$ are independent random variables. Using that $\mathbb{E}(X_i X_j) = 0$ for all $i \neq j$ and $\mathbb{E}(X_i)=0$ for all $i$, we get

$$\mathbb{E}[(AX+a)_k (AX+a)_l] = \sum_{i=1}^n a_{ki} a_{li} + a_k a_l = (A A^T)_{k,l} + a_k a_l.$$ Hence, $$\text{cov}(Y_k,Y_l) = (A A^T)_{k,l}.$$