Solved – Intuitive meaning of vector multiplication with covariance matrix

covariance-matrixintuitionlinear algebra

I often see multiplications with covariance matrices in literature. However I never really understood what is achieved by multiplication with the covariance matrix.
Given $\Sigma * r = s$ with $\Sigma$ being the covariance matrix of $n$ random variables $X_i$, can someone give me an intuitive explanation what $s$ gives me?

What I already (at least think) to understand is the principle of covariance in general and a meaning of the covariance matrix in terms of a linear basis with the $i$th basis vectors being the covariance between random variable $X_i$ and $X_j$ for $1 \leq j \leq n$.

Some intuition I already gathered is as follows:
By multiplying $\Sigma * r$ we weight the samples $X_i$ according to $r$. With fixed $i$, $y_i$ gives us then the sum over the weighted covariances with $X_i$ and $X_j$ for $1 \leq j \leq n$, which means a value of how well $X_i$ "covaries" in the direction of $r$.
But what would mean "in the direction of $r$" and what is this result really useful for?
Often I see a value like this: $r^T * \Sigma^{-1} * r$
What would this value be useful for?
(And I know about the nice properties of the Eigenvalues and Eigenvectors for $\Sigma$)

Best Answer

Your answer is good. Note that since $\Sigma$ is symmetric and square so is $\Sigma^{-1}$. The matrix, its transpose, or inverse all project your vector $\Sigma r$ in the same space.

Since $\Sigma$ and $\Sigma^{-1}$ are positive definite, all eigenvalues are positive. Thus a multiplication with a vector always ends up in the same halfplane of the space.

Now if $\Sigma$ or $\Sigma^{-1}$ would be a diagonal matrix, then the multiplication would reweigh (or undo the reweigh) of only the lengths of the target vector in each dimension (as you noticed). If they are full matrices, then indeed the matrix is full rank as it is PSD, the eigendecomposition exists and $\Sigma = V \Lambda V^{-1}$, here $V$ is an orthonormal eigenvector matrix by the virtue of $\Sigma$ being PSD, and $\Lambda$ the diagonal with eigenvalues. Thus $r$ is first rotated by $V^{-1}$, and then reweighed by $\Lambda$, then rotated back by $V$. The same thing goes for $\Sigma^{-1}$, but then $r$ is rotated the other way around and the scaled by the diagonal of reciprocals $\Lambda^{-1}$ and rotated back with $V^{-1}$. It is easy to see they are opposite processes.

Additionally, you may think of $$ r^T \Sigma^{-1} r = (\Lambda^{-1/2} V^T r)^T(\Lambda^{-1/2} V^T r) = \big\|\Lambda^{-1/2} V^T r\big\|^2 $$ as the length of your vector $r$ reweighed by the "standard deviations" after correction for cross-correlations.

Hope that helps.

Related Question