Joint Gaussian PDF Change of Coordinates

change-of-variablelinear algebraorthogonal matricesprobabilitystatistics

My textbook says the following:

Given a vector $\mathrm{\mathbf{x}}$ of random variables $x_i$ for $i = 1, \dots, N,$ with mean $\bar{\mathrm{\mathbf{x}}} = E[\mathrm{\mathbf{x}}]$, where $E[\cdot]$ represents the expected, and $\Delta \mathrm{\mathbf{x}} = \mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}}$, the covariance matrix $\Sigma$ is an $N \times N$ matrix given by

$$\Sigma = E[\Delta \mathrm{\mathbf{x}} \Delta \mathrm{\mathbf{x}}^T]$$

so that $\Sigma_{i j} = E[ \Delta x_i \Delta x_j]$. The diagonal entries of the matrix $\Sigma$ are the variances of the individual variables $x_i$, whereas the off-diagonal entries are the cross-covariance values.

The variables $x_i$ are said to conform to a joint Gaussian distribution, if the probability distribution of $\mathrm{\mathbf{x}}$ is of the form

$$P(\bar{\mathrm{\mathbf{x}}} + \Delta \mathrm{\mathbf{x}}) = (2 \pi) ^{-N/2} \det(\Sigma^{-1})^{1/2} \exp(-(\Delta \mathrm{\mathbf{x}})^T \Sigma^{-1} (\Delta \mathrm{\mathbf{x}})/2) \tag{A2.1}$$

for some positive-semidefinite matrix $\Sigma^{-1}$.

$\vdots$

Change of coordinates. Since $\Sigma$ is symmetric and positive-definite, it may be written as $\Sigma = U^TDU$, where $U$ is an orthogonal matrix and $D = (\sigma_1^2, \sigma_2^2, \dots, \sigma_N^2)$ is diagonal. Writing $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ and $\bar{\mathrm{\mathbf{x}}}' = U \bar{\mathrm{\mathbf{x}}}$, and substituting in (A2.1), leads to

$$ \begin{align*}\exp(-(\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})^T \Sigma^{-1} (\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})/2) &= \exp(-(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')^T U \Sigma^{-1} U^T (\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')/2) \\ &= \exp(-(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')/2) \end{align*}$$

Thus, the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. A further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. Equivalently stated, a change of coordinates may be applied to transform Mahalanobis distance to ordinary Euclidean distance.

I don't understand how the author derived these expressions:

$$ \begin{align*}\exp(-(\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})^T \Sigma^{-1} (\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})/2) &= \exp(-(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')^T U \Sigma^{-1} U^T (\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')/2) \\ &= \exp(-(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')/2) \end{align*}$$

My attempt was as follows:

$$ \begin{align*}\exp(-(\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})^T \Sigma^{-1} (\mathrm{\mathbf{x}} – \bar{\mathrm{\mathbf{x}}})/2) &= \exp(-(U^{-1}\mathrm{\mathbf{x}}' – U^{-1} \bar{\mathrm{\mathbf{x}}}')^T \Sigma^{-1}(U^{-1} \mathrm{\mathbf{x}} – U^{-1} \bar{\mathrm{\mathbf{x}}})/2 ) \\ &= \exp(-(U^{-1}(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}'))^T \Sigma^{-1} (U^{-1}(\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}'))/2) \\ &= \exp(-((\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')^T U) \Sigma^{-1} \dfrac{U^T}{2} (\mathrm{\mathbf{x}}' – \bar{\mathrm{\mathbf{x}}}')/2) \tag{*} \end{align*}$$

(*) Since $(AB)^T = B^T A^T$ and $U^T = U^{-1}$ ($U$ is orthogonal).

As you can see, I can't figure out how to derive the expressions that the author outlines. In fact, based on my work, as shown above, I can't see how such a derivation is possible?

I would greatly appreciate it if people could please take the time to demonstrate this.

Best Answer

You are only missing the implication $\Sigma = U^{T}DU \Rightarrow \Sigma^{-1} = U^{T}D^{-1}U$. Now, by definition we can write $\Sigma = U^{T}DU$. For invertible matrices $A,B$ it holds that $(AB)^{-1} = B^{-1}A^{-1}$. Therefore $$ \Sigma^{-1} = (U^{T}DU)^{-1} = U^{-1}(U^{T}D)^{-1} = U^{-1}D^{-1}(U^{T})^{-1} = U^TD^{-1}U $$ where we used the fact that $U^{-1} = U^T$.

Related Question