Product of block matrices

block matriceslinear algebramatrices

Let $A$ be a $(n\times k)$-matrix, $B$ a $(n\times d)$-matrix and $M=[A \quad B]$ the block matrix (or the augmented matrix).
Doing the calculations, I obtained that
$$M'M=
\left ( \begin{array}{cc}
A'A & A'B\\
B'A & B'B
\end{array}\right ),
$$

where $'$ stands for transposition.

Do you have a good justification for this result rather than observe it by brute force?

Best Answer

Block-matrix multiplication is a very useful tool which, unfortunately, is rarely explained well. Here's the way I think about it.

Recall that if we choose a basis for a vector space $V$, any linear transformation has an associated matrix (whose entries are numbers) with respect to this choice of basis. Similarly, if we break a vector space into a direct sum $V = V_1 \oplus \cdots \oplus V_n$, then a block matrix specifies that transformation with respect to this decomposition into a direct sum.

Definitions of a direct sum of vector spaces vary, but I mean the following. We say that $V = V_1 \oplus \cdots \oplus V_n$ ($V$ is the direct sum of $V_1,\dots,V_n$) if $V$ is the vector space whose elements are the column-vectors $$ v = (v_1,\dots,v_n) = \pmatrix{v_1 \\ \hline v_2 \\ \hline \vdots \\ \hline v_n}, \quad v_i \in V_i, \ i = 1,\dots,n. $$ The horizontal lines in the second version are simply a notational convenience, used to emphasize the separation between the cells of our column vector. It will often be convenient to refer to an element of the form $v = (0,\dots,0,v,0,\dots,0)$. To that end, we will use the notation $v \otimes e_i:= (0,\dots,0,v,0,\dots,0)$, where $i$ is the index of the non-zero entry. So, we could write $$ (v_1,\dots,v_n) = v_1 \otimes e_1 + v_2 \otimes e_2 + \cdots + v_n \otimes e_n. $$ (this notation is ultimately motivated by the definition of the tensor product, but explaining this is not necessary for the purposes of this discussion)

Now, suppose that $T:V \to W$ is a linear map, with $V = \bigoplus_{i=1}^n V_i$, $W = \bigoplus_{i=1}^m W_i$. We see that for any $j$, $T(v \otimes e_j)$ is a linear map. With that established, we see that there must exist maps $T_{1j},\dots,T_{nj}$ such that $$ T(v \otimes e_j) = (T_{1j}(v),T_{2j}(v),\dots,T_{mj}(v)). $$ With the maps $T_{ij}$ defined in this fashion, we see that, by the definition of linearity, we have $$ T(v_1,\dots,v_n) = \pmatrix{T_{11}(v_1) + \cdots + T_{1n}(v_n)\\ \vdots \\ T_{m1}(v_1) + \cdots + T_{mn}(v_n)} := \pmatrix{T_{11} & \cdots & T_{1n}\\ \vdots & \ddots & \vdots \\ T_{m1} & \cdots & T_{mn}} \pmatrix{v_1\\ \vdots \\ v_n}. $$ On the right side above, we have introduced a new notation: we have a matrix whose entries are linear transformation. By performing the indicated "matrix multiplication", we obtain the desired result, namely $T(v_1,\dots,v_n)$. With that in mind, we say that the block-matrix representation of $T$ (with respect to the decompositions of $V,W$) is given by $$ T = \pmatrix{ T_{11} & \cdots & T_{1n}\\ \vdots & \ddots & \vdots\\ T_{m1} & \cdots & T_{mn} } = \left( \begin{array}{c|c|c} T_{11} & \cdots & T_{1n}\\ \hline \vdots & \ddots & \vdots\\ \hline T_{m1} & \cdots & T_{mn} \end{array} \right). $$ Now, suppose that we have maps $T:V \to W$ and $S: W \to Z$, where $V,W$ are as above and $Z = Z_1 \oplus \cdots \oplus Z_p$. We can now see that for any vector $(v_1,\dots,v_n) \in V$, we have $$ (S \circ T)(v) = S(Tv) = S \left[ \pmatrix{T_{11} & \cdots & T_{1n}\\ \vdots & \ddots & \vdots \\ T_{m1} & \cdots & T_{mn}} \pmatrix{v_1\\ \vdots \\ v_n} \right]\\ = \pmatrix{S_{11} & \cdots & S_{1m}\\ \vdots & \ddots & \vdots \\ T_{p1} & \cdots & T_{pm}} \left[ \pmatrix{T_{11} & \cdots & T_{1n}\\ \vdots & \ddots & \vdots \\ T_{m1} & \cdots & T_{mn}} \pmatrix{v_1\\ \vdots \\ v_n} \right] \\ = \left[\pmatrix{S_{11} & \cdots & S_{1m}\\ \vdots & \ddots & \vdots \\ T_{p1} & \cdots & T_{pm}} \pmatrix{T_{11} & \cdots & T_{1n}\\ \vdots & \ddots & \vdots \\ T_{m1} & \cdots & T_{mn}} \right] \pmatrix{v_1\\ \vdots \\ v_n}, $$ where we define the product of two block matrices in the "expected way". On the other hand, $$ (S \circ T)(v) = \pmatrix{(S \circ T)_{11} & \cdots & (S \circ T)_{1n}\\ \vdots & \ddots & \vdots \\ (S \circ T)_{m1} & \cdots & (S \circ T)_{mn}}\pmatrix{v_1\\ \vdots \\ v_n}. $$ So, it must be the case that multiplying the block-matrix of $S$ and the block-matrix of $T$ gives us the block-matrix of $S \circ T$.


For your particular example, $M$ represents a map from $\Bbb R^{k + d} = \Bbb R^k \oplus \Bbb R^d$ to $\Bbb R^n$. The block-matrix of $M$ is given by $[A\ \ B]$, which is to say that $$ M \pmatrix{v_1\\v_2} = \pmatrix{A & B} \pmatrix{v_1\\v_2} = Av_1 + Bv_2. $$ On the other hand, the block matrix of $M'$ is $\left[\begin{smallmatrix}A'\\B'\end{smallmatrix}\right]$, which is to say that for $w \in \Bbb R^n$ we have $$ M'w = \pmatrix{A'\\B'}w = \pmatrix{A'w\\ B'w}. $$ With that established, the block matrix of $M' \circ M$ is the product of the block matrices, $$ M'M = \pmatrix{A'\\B'} \pmatrix{A & B} = \pmatrix{A'A & A'B\\ B'A & B'B}. $$ Indeed, if you agree with my description of the maps corresponding to $M$ and $M'$, then for $v = (v_1,v_2)$ we have $$ M'M v = M'(Mv) = M' (Av_1 + Bv_2) = \pmatrix{A'(Av_1 + Bv_2)\\ B'(Av_1 + Bv_2)}\\ = \pmatrix{A'A v_1 + A'B v_2\\ B'A v_1 + B'B v_2}. $$

Related Question