[Math] Intuition for the Product of Vector and Matrices: $x^TAx $

linear algebramatricessoft-question

When I took linear algebra, I had no trouble with the mechanical multiplication of matrices. Given the time to write things out and mumble a bit about $i$th and $j$th rows, I can do the products no problem. However, as I expand my interests and read more advanced texts, the pause to mumble and scratch is becoming a significant barrier to my comprehension.

So, I ask those more experienced how best to build intuition for matrix multiplication, especially large or arbitrary matrices. Are there any good tricks or rules of thumb that I've missed? Does it just come with constant exposure/repetition? How would you go about quickly interpreting (for example) the statement: $$x^TAx $$ where $A$ is a $n$ by $n$ matrix and $x$ is a $n\times 1$ matrix?

Best Answer

To add to the earlier response, and interpreting your question to ask for heuristics for how to manipulate matrices, and not specifically for what matrix multiplication means.

I assume here we interpret vectors as columns vectors, so $x^T$ would refer to a row vector, and capitals for matrices. When $A=(a_{ij})$ then $A^T=(a_{ji})$ so transposition, that is, interchange of rows and columns, corresponds to switching the indices! Remembering that, you can easily convert a symbolic matrix product to a sum over indexed expressions, manipulate, and reconvert to a symbolic matrix product.

One useful trick is pre- and post-multiplication by diagonal matrices: premultiplication corresponds to operations on the rows, while post-multiplication corresponds to operations on the columns. That is, letting $D$ be a diagonal matrix, in $DA$ each row of $A$ is multiplied by the corresponding diagonal element of $D$, while in $AD$ each column of $A$ is multiplied with the corresponding diagonal element.

Now an example to show how to use this manipulative tricks. Suppose $X$ is an $n\times n$ matrix such that there exists an basis for $\mathbb R^n$ consisting of eigenvectors of $X$ (we assume all elements are real here). That is, the eigenvalue/eigenvector equation $Xx=\lambda x$ has $n$ linearly independent solutions, call them (or some choice of them if they are not unique) $x_1, \dots, x_n$. with corresponding eigenvalues $\lambda_i$, the elements of the diagonal matrix $\Lambda$. Write $$ X x_i = \lambda_i x_i $$ Now let $P$ be a matrix with the $x_i$ as columns. How can we write the equations above as one matrix equation? Note that the constants $\lambda_i$ are multiplying columns, we know that in the matrix representation the diagonal matrix $\Lambda$ must postmultiply $P$. That is, we get $$ X P = P \Lambda $$ Premultiplying on both sides with the inverse of $P$, we get $$ P^{-1} X P = \Lambda $$ That is, we can se that $X$ is similar to the diagonal matrix consisting of its eigenvalues.

One more example: If $S$ is a sample covariance matrix, how can we convert it to a sample correlation matrix? The correlation between variable $i$ and $j$ is the covariance divided into the standard deviations of variable $i$ and of variable $j$: $$ \text{cor}(X_i,X_J) = \frac{\text{cov}(X_i, X_j)} {\sqrt{\text{var}(X_i) \text{var}(X_j) }} $$

Looking at this with matrix eyes, we are dividing the $(i,j)$-element of the matrix $S$ with the square roots of the $i$th and $j$th diagonal elements! We are dividing each row of $S$ and each column of $S$ with the same diagonal elements, so it can be expressed as pre- and post-multiplication by the (same) diagonal matrix, that holding the square roots of the diagonal elements of $S$. We have found: $$ R = D^{-1/2} S D^{-1/2} $$ where $R$ is the sample correlation matrix, and $D$ is a diagonal matrix holding the diagonal elements of $S$.

There are a lots of applications of this kind of tricks, and I find it so useful that textbooks should include them. One other example: Now let P be a permutation matrix, that is an $n\times n$ matrix representing a permutation on $n$ symbols. Such a matrix has one 1 and $n-1$ zeros in each row and each column, and can be obtained by permuting (in the same way!) the rows and columns of an identity matrix. Now $AP$ (since it is a post-multiplication) permutes the columns of $A$, while $PA$ permutes the rows of $A$.

Related Question