Multivariate Distribution – Expectation of a Multivariate Random Variable Explained

joint distributionmultivariate distributionprobabilityrandom vector

Given a multivariate random variable $\mathbf{X}=(X_1, …, X_n)^\intercal : \Omega \rightarrow \mathbb{R}^n$ I want to determine the expectation value of this RV. Now wikipedia says the expectation is simply given by:
$$\mathbb{E}[\mathbf{X}]=(\mathbb{E}[X_1], …, \mathbb{E}[X_n])^\intercal.$$
However, the distribution of a random vector is the joint distribution of all its components, i.e., the distribution of $\mathbf{X}$ is $\mathbb{P}_{X_1, …, X_n}$. Hence, for instance for $n=2$ if $X_1$ and $X_2$ are discrete the expected value would be computed like:
\begin{align}
\mathbb{E}[\mathbf{X}]&= \sum_{i=1}^\infty \mathbf{X}_i \mathbb{P}_{X_1,X_2}(x_1^{(i)}, x_2^{(i)})\\
&=\sum_{i=1}^\infty \binom{x_1^{(i)}}{x_2^{(i)}} \mathbb{P}_{X_1,X_2}(x_1^{(i)}, x_2^{(i)})\\
&=\binom{\sum_{i=1}^\infty x_1^{(i)} \mathbb{P}_{X_1,X_2}(x_1^{(i)}, x_2^{(i)})}{\sum_{i=1}^\infty x_2^{(i)} \mathbb{P}_{X_1,X_2}(x_1^{(i)}, x_2^{(i)})}
\end{align}

And this doesnt occur to me to be the same as
$$\mathbb{E}[\mathbf{X}]=(\mathbb{E}[X_1], \mathbb{E}[X_2])^\intercal.$$
How do these two definitions match?

Best Answer

It looks like you might be getting tripped up with the indexing and summations. Here's how to handle the 2d discrete example using the same approach you were trying to take. The same logic applies to higher dimensions and/or continuous variables (in that case, replace sums with integrals).

Notation

Let $x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ be a realization of the random vector $X = \begin{bmatrix} X_1 \\ X_2 \end{bmatrix}$. I'll use the shorthand $p(x_1, x_2)$ for the joint probability $Pr(X_1=x_1, X_2=x_2)$, and $p(x_1)$ for the marginal probabillity $Pr(X_1=x_1)$. Sums will be taken over all possible values of each variable. For example, if $X_1$ has range $\mathcal{X}_1$ then $\sum_{x_1}$ is shorthand for $\sum_{x_1 \in \mathcal{X}_1}$.

Expectation

$$E[X] = \sum_x x p(x)$$

$$= \sum_{x_1} \sum_{x_2} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} p(x_1, x_2)$$

$$= \sum_{x_1} \sum_{x_2} \begin{bmatrix} x_1 p(x_1, x_2) \\ x_2 p(x_1, x_2) \\ \end{bmatrix}$$

$$= \begin{bmatrix} \sum_{x_1} \sum_{x_2} x_1 p(x_1, x_2) \\ \sum_{x_1} \sum_{x_2} x_2 p(x_1, x_2) \\ \end{bmatrix}$$

$$= \begin{bmatrix} \sum_{x_1} \sum_{x_2} x_1 p(x_1, x_2) \\ \sum_{x_2} \sum_{x_1} x_2 p(x_1, x_2) \\ \end{bmatrix}$$

$$= \begin{bmatrix} \sum_{x_1} x_1 \sum_{x_2} p(x_1, x_2) \\ \sum_{x_2} x_2 \sum_{x_1} p(x_1, x_2) \\ \end{bmatrix}$$

$$= \begin{bmatrix} \sum_{x_1} x_1 p(x_1) \\ \sum_{x_2} x_2 p(x_2) \\ \end{bmatrix}$$

$$= \begin{bmatrix} E[X_1] \\ E[X_2] \\ \end{bmatrix}$$

Related Question