[Math] Why do we define change of basis matrix to be the transpose of the transformation

linear algebra

Example. Let $V$ be a finite dim vector space with two different bases

$S = \{ u_1,u_2 \} = \{ (1,2),(3,5) \}$ and $S' = \{ v_1, v_2 \} = \{ (1,-1), (1,-2) \} $

You can check that $v_1 = -8u_1 + 3u_2$ and $v_2 = -11u_1+4u_2$ and $P = \begin{bmatrix}
-8 &-11 \\
3& 4
\end{bmatrix}$ is the change of basis where the columns are the coords.

But why can't we define change of basis by their rows? Since it works nicely that

$\begin{bmatrix}
v_1 \\
v_2
\end{bmatrix} = \begin{bmatrix}
-8 &3 \\
-11& 4
\end{bmatrix}\begin{bmatrix}
u_1 \\
u_2
\end{bmatrix}$

So to move from the old basis $S$, you apply the matrix $\begin{bmatrix}
-8 &3 \\
-11& 4
\end{bmatrix}$ to get a new basis. Why do we have to transpose? If you transpose, how do you even use this change of basis? Why can't I use this definition of change of basis.

Best Answer

This is a good question. It might be comforting to know that there is always some arbitrary choices involved in the whole issue of representing vectors of an abstract vector space as an array of numbers, linear maps as a double array of numbers (matrices) and what one means by changing a basis. The most important thing is to set up a notation that minimizes the number of arbitrary choices and is self consistent.

Let me try and explain the motivation behind the most popular notation and then reconsider your example. Fix a vector space $V$ and let $\mathcal{B} = (v_1, \dots, v_n)$ be some basis of $V$. The basis $\mathcal{B}$ allows us to identify a vector $v \in V$ with a list of scalars by representing $v$ (uniquely) as $v = a_1 v_1 + \dots + v_n a_n$ and identifying $v$ with the list $(a_1,\dots,a_n)$ which is called the coordinates of $v$ with respect to $\mathcal{B}$. The convention is that we treat this list as a column vector and write

$$ [v]_{\mathcal{B}} := \begin{pmatrix} a_1 \\ \vdots \\ a_n \end{pmatrix}. $$

Given a linear map $T \colon V \rightarrow W$, a basis $\mathcal{B} = (v_1,\dots,v_n)$ of $V$ and a basis $\mathcal{C} = (w_1,\dots,w_m)$ of $W$, we can represent each vector $T(v_i)$ as a linear combination $T(v_j) = \sum_{i=1}^m a_{ij} w_i$. The convention is that we treat the array $A = (a_{ij})$ as a double array (which we call a matrix) for which $i$ is the row index and $j$ is the column index. The matrix $A \in M_{m \times n}(\mathbb{F})$ is denoted by $A = [T]^{\mathcal{B}}_{\mathcal{C}}$ and is called the matrix representing $T$ with respect to the basis $\mathcal{B}$ (of the domain) and the basis $\mathcal{C}$ (of the codomain). This convention has the slightly annoying feature (especially to beginners) that a linear map from an $n$-dimensional space to an $m$-dimensional space is represented by an $m \times n$ matrix (so the dimensions are "reversed") but its most important advantage is that it identifies matrix multiplication and composition/evaluation. Namely, we have the following formulas:

$$ [T(v)]_{\mathcal{C}} = [T]^{\mathcal{B}}_{\mathcal{C}} \cdot [v]_{\mathcal{B}}, \,\,\, [T \circ S]^{\mathcal{B}}_{\mathcal{D}} = [T]^{\mathcal{C}}_{\mathcal{D}} \cdot [S]_{\mathcal{C}}^{\mathcal{B}} $$

where $\cdot$ is matrix multiplication.

Finally, let us discuss the change of basis matrices. If $\mathcal{B} = (u_1,\dots,u_n)$ and $\mathcal{B}' = (v_1,\dots,v_n)$ are two bases of $V$, the change of basis matrix "from $\mathcal{B}'$ to $\mathcal{B}$" is the matrix $P = [\operatorname{id}]_{\mathcal{B}}^{\mathcal{B'}}$ where $\operatorname{id} \colon V \rightarrow V$ is the identity transformation. Using the properties above, we see that we have

$$ P[v]_{\mathcal{B}'} = [\operatorname{id}]_{\mathcal{B}}^{\mathcal{B'}} [v]_{\mathcal{B}'} = [v]_{\mathcal{B}}. $$

Thus, given the coordinates a vector $v \in V$ in the "new basis" $\mathcal{B}'$, the matrix $P$ allows us to compute the coordinates of $v$ in the "old basis" $\mathcal{B}$ by performing matrix multiplication. The decision which basis to call "the old" and which "the new" is not entirely standard and depends on whether you prefer to change basis vectors or coordinates. In physics, this is related to the "passive v.s active" point of view of linear transformations.


Finally, let me reconsider your example. We have $V = \mathbb{R}^2$, $\mathcal{B} = (u_1 = (1,2),u_2 = (3,5))$ and $\mathcal{B}' = (v_1 = (1,-1), v_2 = (1,2))$. When representing elements in the basis $\mathcal{B}$, I'll use the letter $a$ and when writing elements in the basis $\mathcal{B}'$ I'll use the letter $b$ for the coefficients. That is,

$$ v = a_1 u_1 + a_2 u_2 = b_1 v_1 + b_2 v_2. $$

The matrix $P$ has the feature that

$$ P \begin{pmatrix} b_1 \\ b_2 \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} $$

and so it tells you how to transform the coordinates of an arbitrary vector in the basis $\mathcal{B}'$ to its coordinates in the basis $\mathcal{B}$. For example, if

$$ v = 1 \cdot v_1 + 1 \cdot v_2 = 1 \cdot (-8u_1 + 3u_2) + 1 \cdot (-11u_1 + 4u_2) = -19 u_1 + 7 u_2 $$

we have

$$ [v]_{\mathcal{B}} = \begin{pmatrix} -19 \\ 7 \end{pmatrix}, \,\,\, [v]_{\mathcal{B}'} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \\ \begin{pmatrix} -8 && -11 \\ 3 && 4 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} -8 && -11 \\ 3 && 4 \end{pmatrix} [v]_{\mathcal{B}'} = \begin{pmatrix} -19 \\ 7 \end{pmatrix} = [v]_{\mathcal{B}}. $$

Related Question