[Math] Why is it so easy to marginalize a multivariate random distribution

normal distributionprobability distributions

From wikipedia:

To obtain the marginal distribution over a subset of multivariate
normal random variables, one only needs to drop the irrelevant
variables (the variables that one wants to marginalize out) from the
mean vector and the covariance matrix.

Is there any intuition of why this works out so well?

Best Answer

This should readily follow if you look at the expression for the moment generating function (M.G.F.) for a multivariate normal density. M.G.F. looks like:

$E_X(e^{t'X}) = e^{\mu't + 0.5t'\Sigma t}$ for any real $n$-vector $t$, where $X = (X_1, X_2,...,X_n)$ is multivariate normal $N_n(\mu,\Sigma)$ random variate [simple to derive this!]

You can see from the expression that the exponent $\mu't + 0.5t'\Sigma t$ is a scalar and moreover, if you want to find the M.G.F. of a $p$-subset ($p \ne 0$) of the random vector $X$, say, {${X_{i_1}, X_{i_2}, ..., X_{i_p}}$}, where, {$i_1,i_2,..,i_p$} is a permutation of {$1,2,..,n$}, then you just need to plug in $t_j=0$ for $j$ $\epsilon$ {$1,2,..,n$}-{$i_1,i_2,..,i_p$} in the expression of M.G.F. The simplified form reduces to a known M.G.F. of a uni/multi-variate normal and that M.G.F. uniquely determines a distribution confirms the nice property of the marginal density of multivariate normal distribution.