Why isn’t $[P]_{C \leftarrow B}[T(x)]_{B}$ equal to $[T(x)]_C$ ? $P$ is the change of basis matrix from $B$ to $C$ and $T$ is a linear transformation

change-of-basislinear algebralinear-transformationsmatrices

Why isn't $[P]_{C \leftarrow B}[T(x)]_{B}$ equal to $[T(x)]_C$ ? ($P$ is the change of basis matrix from $B$ to $C$ (both vector spaces) and $T$ is a linear transformation).

It would make a lot of intuitive sense if left-multiplying the matrix of a transformation $T$ with respect to $B$ would give you the matrix of $T$ with respect to $C$. Yet this does not seem to be the case.

For example: $B$ and $C$ are both bases for the space of 2nd degree polynomials ($\mathscr{P}_2$).

$B$ = {$1,x,x^2$} and $C$ = {$1, x+2, (x^2+2)^2$}

$T(x)$ is a linear transformation from $\mathscr{P}_2$ to $\mathscr{P}_2$ such that $T(p(x))$ = $p(x+2)$.

So the matrix of $T$ with respect to $B$ ([$T(x)]_B$) is:

$$ \left[
\begin{array}{ccc}
1&1&1\\
0&2&4\\
0&0&4\\
\end{array}
\right] $$

And $[T(x)]_C$ is:

$$ \left[
\begin{array}{ccc}
1&4&1\\
0&1&8\\
0&0&16\\
\end{array}
\right] $$

(I found these by applying the transformation to the 3 vectors in the basis of the space, and then finding the coordinate vectors of those vectors with respect to the standard basis and using those as the columns of the matrix. In this case $B$ happens to be the standard basis)

The change of basis matrix ($[P]_{C \leftarrow B}$) is:

$$ \left[
\begin{array}{ccc}
1&-2&4\\
0&1&-4\\
0&0&1\\
\end{array}
\right] $$

$[P]_{C \leftarrow B}[T(x)]_C$ =

$$ \left[
\begin{array}{ccc}
1&-3&9\\
0&2&12\\
0&0&4\\
\end{array}
\right] $$

$\neq$ $[T(x)]_C$, since $[T(x)]_C$ is the identity matrix.

Why is this? Shouldn't multiplying a matrix of a transformation with respect to one basis by the change of basis matrix for that basis bring it to the other basis? Why does it not? What's the difference between the matrix of the transformation and any other matrix?

Best Answer

The notation $[T(x)]_{\mathcal B}$ obscures the fact that there are actually two bases in play: the “input“ basis, i.e., the basis $\mathcal B$ chosen for the domain of $T$, and the “output“ basis—the basis $\mathcal B'$ chosen for $T$’s codomain. In multiplying on the left by a change-of-basis matrix, you’ve changed the output basis, but not the input basis.

A notation that makes this explicit is $[T]_{\mathcal B'}^{\mathcal B}$. Writing the input basis as a superscript and output as a subscript allows for a simple mnemonic device: if you denote the coordinate tuple relative to $\mathcal B$ of a vector $\mathbf v$ by $[\mathbf v]_{\mathcal B}$, then in the product $[T]_{\mathcal B'}^{\mathcal B}[\mathbf v]_{\mathcal B}$, you can think of the two $\mathcal B$’s as formally canceling when they’re adjacent. Similarly, when multiplying two matrices, the superscript of the left-hand term “cancels” against the subscript of the right-hand term. In this notation, the $[T(x)]_{\mathcal B}$ in your question would be written $[T]_{\mathcal B}^{\mathcal B}$, with the same basis for input and output.†

This notation also lets you check that the bases match up correctly: $[T]_{\mathcal B'}^{\mathcal B}[\mathbf v]_{\mathcal B}$ makes sense because the matrix expects as input coordinates relative to $\mathcal B$, but $[T]_{\mathcal B}^{\mathcal B'}[\mathbf v]_{\mathcal B}$ doesn’t because the input must be expressed relative to $\mathcal B'$, but $[\mathbf v]_{\mathcal B}$ is in a different basis.

Using this notation, a change-of-basis matrix is the matrix of the identity map with appropriate input and output bases: instead of $P_{\mathcal C\leftarrow\mathcal B}$ one would write $[\operatorname{id}]_{\mathcal C}^{\mathcal B}$. To change the basis of the coordinate tuple of a vector, then, we just left-multiply by a change-of-basis matrix: $$[\operatorname{id}]_{\mathcal C}^{\mathcal B} [\mathbf v]_{\mathcal B} = [\mathbf v]_{\mathcal C}.$$ A vector only has an output, if you will.

On the other hand, a linear transformation has both an input and output, and each must be dealt with separately. By only left-multiplying by a change-of-basis matrix, as you might do for a vector, you only change the output basis: $$[\operatorname{id}]_{\mathcal C}^{\mathcal B} [T]_{\mathcal B}^{\mathcal B} = [T]_{\mathcal C}^{\mathcal B}.$$ With this matrix, the results of applying $T$ to a vector are expressed relative to the new basis $\mathcal C$, but the inputs to $T$ must still be expressed relative to the original basis $\mathcal B$. In order to change the input basis, you must also right-multiply by an appropriate change-of-basis matrix: $$[\operatorname{id}]_{\mathcal C}^{\mathcal B} [T]_{\mathcal B}^{\mathcal B} [\operatorname{id}]_{\mathcal B}^{\mathcal C} = [T]_{\mathcal C}^{\mathcal C},$$ or, in your notation, $$[P]_{\mathcal C\leftarrow\mathcal B} [T]_{\mathcal B} [P]_{\mathcal B\leftarrow\mathcal C}.$$

Another way to look at is is that you want to input and produce coordinate tuples relative to the basis $\mathcal C$, but you’ve only got a matrix that eats and spits out coordinate tuples relative to $\mathcal B$. In order to use this matrix, you first have to convert the input from $\mathcal C$ to $\mathcal B$, after which you can multiply by the matrix, and then convert the result from $\mathcal B$ to $\mathcal C$.

Turning now to your example, first of all, $\mathcal C$ is not a basis for $\mathscr P_2$: its third element is a fourth-degree polynomial, so isn’t even an element of the space. I’ll assume that you meant $(x+2)^2$. We then need to correct the matrix that you computed for $[T]_{\mathcal B}^{\mathcal B}$. You appear to have gotten the order of the coordinates mixed up. $T(x)=x+2$, but $[x+2]_{\mathcal B}=[2,1,0]^T$, not $[1,2,0]^T$, and similarly for $T(x^2)$. So, $$[T]_{\mathcal B}^{\mathcal B} = \begin{bmatrix}1&2&4\\0&1&4\\0&0&1\end{bmatrix}.$$ You’ve made the same mistake in the third column of the matrix of $T$ relative to $\mathcal C$, but there’s an even more fundamental error there: the results of applying $T$ to the elements of $\mathcal C$ are expressed relative to the standard basis, so you’ve actually constructed $[T]_{\mathcal B}^{\mathcal C}$ instead of $[T]_{\mathcal C}^{\mathcal C}$. The matrix product you then compute is $$[\operatorname{id}]_{\mathcal C}^{\mathcal B} [T]_{\mathcal B}^{\mathcal B} = [T]_{\mathcal C}^{\mathcal B} \ne [T]_{\mathcal B}^{\mathcal C}.$$

To obtain $[T]_{\mathcal C}^{\mathcal C}$ you instead have to express the images of the basis vectors relative $\mathcal C$, or, since you’ve already worked out what they are in the standard basis, apply a change of basis to the output side of the matrix you computed. I.e., $$[T]_{\mathcal C}^{\mathcal C} = [\operatorname{id}]_{\mathcal C}^{\mathcal B} [T]_{\mathcal B}^{\mathcal C} = \begin{bmatrix}1&2&4\\0&1&4\\0&0&1\end{bmatrix},$$ which, not coincidentally, is equal to both $[T]_{\mathcal B}^{\mathcal B}$ and $[\operatorname{id}]_{\mathcal B}^{\mathcal C}$. Because of all this, it should be clear without actually performing the multiplications that for this example $$[T]_{\mathcal C}^{\mathcal C} = [\operatorname{id}]_{\mathcal C}^{\mathcal B} [T]_{\mathcal B}^{\mathcal B} [\operatorname{id}]_{\mathcal B}^{\mathcal C}$$ as explained above.


† Writing $T(x)$ for the transformation isn’t quite correct, either: the name of the transformation is $T$; $T(x)$ is its value when applied to $x$, i.e., $T(x)$ is an element of $T$’s range—a vector.