Why is an inverse covariance matrix used in the exponential of the multivariate normal instead of the generalized variance

multivariate-statistical-analysisnormal distributionprobability distributionsprobability theoryrandom variables

So I am learning about the multivariate normal distribution and I have been trying to compare the univariate normal PDF and the multivariate normal PDF. Specifically, I've been identifying which terms in each formula are analogous to each other.

I understand why, in the constant term of the multivariate PDF, the square root of the determinant of the covariance matrix is analogous to the univariate standard deviation (the determinant of the covariance matrix is taken to be the "generalized variance"). However, I cannot understand why we do not just use the same generalized variance as the denominator in the exponential.

I understand that you cannot divide by a matrix, which is why the inverse of the covariance matrix is used. But since the determinant is a scalar and $(x-\mu)(x-\mu)^T$ outputs a scalar, this would compile. Wouldn't the analogy work better if dividing by the generalized variance, or am I missing something?

Best Answer

The function

$$ f(\mathbf{x}) = \frac{1}{\sqrt{(2 \pi)^n \det \Sigma}} \exp\left(-\frac{1}{2 \det \Sigma} (\mathbf x - \mathbf \mu)^T(\mathbf x - \mathbf \mu) \right)$$

is a valid probability density function. However if a random variable $\mathbf X$ has this pdf then the covariance matrix of $\mathbf X$ would be $\det(\Sigma) I$, not $\Sigma$ as we would like. By replacing $\Sigma^{-1}$ with dividing by $\det(\Sigma)$ you lose all of the matrix structure of $\Sigma$.

Related Question