Matrix multiplication as composition

analytic geometrygeometrymatrices

I've been trying to brush up on linear algebra, and I've encountered a problem that I haven't been able to resolve on my own.

A couple of days ago I watched a great video: "Matrix multiplication as composition | Chapter 4, Essence of linear algebra". In this video the author writes down an expression that shows how multiplication of 2 matrices (representing 2 different linear transformations) captures the overall / simultaneous effect of 2 linear transformations:

Matrix multiplication (1)

One thing I can't wrap my head around is how to prove that the second transformation should be applied to the product of the first transformation and a vector we are transforming.

To figure out what coordinates vector $\bar v$ will have in the original basis after a linear transformation, we use the following formula:

$$
\begin{bmatrix}
v1 \\
v2 \\
\end{bmatrix}
=
\begin{bmatrix}
a11 & a12 \\
b21 & a22 \\
\end{bmatrix}
\begin{bmatrix}
v'1 \\
v'2 \\
\end{bmatrix},\tag{*}\label{*}
$$

where $\begin{bmatrix}
a11 & a12 \\
b21 & a22 \\
\end{bmatrix}$

is the matrix that shows what coordinates our new basis vectors will have (with regard to the original basis) after a linear transformation is applied, $\begin{bmatrix}
v'1 \\
v'2 \\
\end{bmatrix}$
are coordinates of a transformed vector in the new basis.


I've written down a very informal proof of this formula: Matrix vector multiplication.


Turning back to the formula that is given in the video (1), I want to understand why do we multiply the matrix
$\begin{bmatrix}
c11 & c12 \\
d21 & d22 \\
\end{bmatrix}$

by the vector
$\begin{bmatrix}
v1 \\
v2 \\
\end{bmatrix}$
$\eqref{*}$:

$$
\begin{bmatrix}
c11 & c12 \\
d21 & d22 \\
\end{bmatrix}
\left(\begin{bmatrix}
a11 & a12 \\
b21 & a22 \\
\end{bmatrix}
\begin{bmatrix}
v'1 \\
v'2 \\
\end{bmatrix}\right)
$$

I have this question because in the formula $\eqref{*}$ we use $\begin{bmatrix}
v'1 \\
v'2 \\
\end{bmatrix}$
but not
$\begin{bmatrix}
v1 \\
v2 \\
\end{bmatrix}$
.

I would appreciate if you could point me in the right direction. Frankly speaking, I'm totally confused, despite the fact that I see that numerically everything adds up.

Best Answer

An $m \times n$ matrix $A \in M(m \times n, \mathbb{R})$ encodes a linear transformation $L_A \colon \mathbb{R}^n \to \mathbb{R}^m$ by the following rule: $L_Ae_j = a_j$ for $1 \leq j \leq n$, where $a_j \in \mathbb{R}^m$ denotes the $j$th column of $A$ and $e_j \in \mathbb{R}^n$ is the $j$th standard basis vector.

Now given another matrix $B \in M(k \times m, \mathbb{R})$, $B$ encodes a linear transformation $L_B \colon \mathbb{R}^m \to \mathbb{R}^k$ just as $A$ does, i.e., $L_Be_i = b_i$ for $1 \leq i \leq m$. The matrix product $BA \in M(k \times n, \mathbb{R})$ is defined so that $L_{BA} = L_BL_A$, that is, \begin{align} (BA)_j &:= L_BL_Ae_j \\ &= L_Ba_j \\ &= L_B\sum_{i = 1}^{m}a_{ij}e_i \\ &= \sum_{i = 1}^{m}a_{ij}L_Be_i \\ &= \sum_{i = 1}^{m}a_{ij}b_i. \end{align} So the fact that matrix multiplication is composition is really the definition of matrix multiplication.

Related Question