Understanding Linear Transformations and its notation

change-of-basislinear algebralinear-transformationsnotationvector-spaces

I am reviewing linear transformations and although I understand most of how they work and what they do, there is one problem that keeps circling in my head. I'm not sure if this is just a pedantic issue of notation, or a conceptual misunderstanding on my part. I am studying from Sergei Treil's Linear Algebra Done Wrong. On page 69 when he's prefacing change of coordinates he reviews the matrix of a linear transformation in the general context of different bases and writes:

Let $T: V \rightarrow W$ be a linear transformation, and let $\mathcal{A} = \{\mathbf{a}_1,\mathbf{a}_2,…,\mathbf{a}_n\}, \mathcal{B} = \{\mathbf{b}_1,\mathbf{b}_2,…,\mathbf{b}_m\}$ be bases in $V$ and $W$ respectively. A matrix $T$ in (or with respect to) the bases $A$ and $B$ is an $m \times n$ matrix, denoted by $[T]_{\mathcal{B}\mathcal{A}}$, which relates the coordinate vectors $[T\mathbf{v}]_{\mathcal{B}}$ and $[\mathbf{v}]_\mathcal{A}$, $$[T\mathbf{v}]_{\mathcal{B}}= [T]_{\mathcal{B}\mathcal{A}}[\mathbf{v}]_\mathcal{A};$$
…The matrix $[T]_{\mathcal{B}\mathcal{A}}$ is easy to find: its $k$th column is jsut the coordinate vector $[T\mathbf{a}_k]_\mathcal{B}$ (compare this to finding the matrix of a linear transformation from $\mathbb{F}^n$ to $\mathbb{F}^m$.

Let me assume that $\mathcal{A}$ and $\mathcal{B}$ are non-standard bases. Clearly, the input vector must be a vector with respect to the basis in the domain. Then when we define how $T$ acts on each basis vector of $\mathcal{A}$ (the domain) in terms of the basis vectors in $\mathcal{B}$ (the codomain), when we multiply a vector written with respect to $\mathcal{A}$ we get the transformed vector written with respect to $\mathcal{B}$, where each coordinate with respect to $\mathcal{A}$ is "transformed" by the proper amount. This makes sense to me in every instance except for when we first see how $T$ acts on the basis vectors in the domain, when we were originally defining it.

When Sergei writes, $[T\mathbf{a}_k]_\mathcal{B}$ (I'm assuming the notation is equivalent to $[T(\mathbf{a}_k)]_\mathcal{B}$), is this notation the same as $[T]_{\mathcal{B}\mathcal{A}}[\mathbf{a}_k]_\mathcal{A}$? If I interpret seeing how $T$ acts on each basis vector as how $T$ acts on each basis vector with respect to that basis it belongs to, you get that $[\mathbf{a}_1]_\mathcal{A} = \mathbf{e}_1$, … ,$[\mathbf{a}_n]_\mathcal{A} = \mathbf{e}_n$. I think of this as viewing how the basis vectors in the domain are analogous to the standard basis with respect to that domain. So I interpret $[T\mathbf{a}_k]_\mathcal{B}$ as really meaning $[T[\mathbf{a}_k]_\mathcal{A}]_\mathcal{B} = [T[\mathbf{e}_k]]_\mathcal{B}$ or $[T(\mathbf{e}_k)]_\mathcal{B}$. This notation makes more sense to me because it seems consistent with the requirement of the input needing to be with respect to the basis in the domain. When I see $T(\mathbf{v})$ without any reference to a basis, I assume that $\mathbf{v}$ is written with respect to $\mathcal{A}$ (the basis in the domain). I.e., that $\mathbf{v}$ is really $[\mathbf{v}]_\mathcal{A}$. I assume this because this is how the transformation is defined. Then following this assumption, when you write $T(\mathbf{a}_k)$, I would assume $\mathbf{a}_k = [\mathbf{a}_k]_\mathcal{A}$, which is the same thing as $T([\mathbf{a}_k]_\mathcal{A}) = T(\mathbf{e}_k)$. Are these the correct assumptions? The issue is then that $\mathbf{a}_k = \mathbf{e}_k$, which is not generally the case with nonstandard bases.

Following the notation I propose, we define how $T$ acts on the analagous "standard basis" with respect to the basis in the domain, and write its transformation in terms of the basis vectors in the codomain:

$$ [T[\mathbf{e}_k]]_\mathcal{B} = \alpha_1\mathbf{b}_1 + \alpha_2\mathbf{b}_2 + … + \alpha_m\mathbf{b}_m = \sum_{i=1}^m \alpha_{ij}\mathbf{b}_i. $$

Then, the coefficients $(\alpha_1,\alpha_2,…,\alpha_m)^T$ represent the coordinates of $T(\mathbf{e}_k)$ with respect to $\mathcal{B}$ as we wanted for a specific input coordinate with respect to $\mathcal{A}$. And the $\alpha$ coefficients represent a column of our transformation matrix. Taking $1 \leq k \leq n$ completes every column of our matrix.

So to conclude, I want to make sure I'm making the correct inferences/assumptions about the notation. Otherwise I'm conceptually misunderstanding something. If you read this far, thank you for taking the time.

Best Answer

Your confusion appears to be coming from an assumption that $V=\mathbb F^n$, that is, that vectors are $n$-tuples of scalars. The notation might make more sense to you if you choose some other set of objects as your vectors, such as polynomials of degree at most $n$ with real coefficients†, so that the important distinction between vectors and their coordinates is more apparent: the vector $\mathbf v$ is then a polynomial, while its coordinate tuple with respect to some ordered basis $\mathcal B$, denoted by $[\mathbf v]_{\mathcal B}$, is a $n$-tuple of real numbers. This notation highlights and maintains the difference between a vector and its coordinate tuple, even when the vectors are themselves tuples of scalars.††

The application of the linear transformation $T:V\to W$ to $\mathbf v\in V$ is denoted by $T\mathbf v$—it’s common in algebra to use simple juxtaposition and omit the brackets that you’re no doubt used to. Let’s again take $V$ and $W$ to be vector spaces of polynomials. A critical thing to note is that $T$ operates on polynomials and produces polynomials: writing $T[\mathbf v]_{\mathcal B}$ is nonsensical since that means that you’re trying to apply $T$ to an $n$-tuple of real numbers instead. On the other hand, writing $[T]_{\mathcal B\mathcal A}[\mathbf v]_{\mathcal A}$ does make sense. Here, the juxtaposition represents matrix multiplication instead of function application, which is probably another source of confusion. We left-multiply the column vector $[\mathbf v]_{\mathcal A}$ by the matrix $[T]_{\mathcal B\mathcal A}$ to obtain another column vector, which happily is equal to $[T\mathbf v]_{\mathcal B}$, i.e., the coordinate tuple of the polynomial $T\mathbf v$ with respect to $\mathcal B$.

The identity $$[T\mathbf v]_{\mathcal B} = [T]_{\mathcal B\mathcal A}[\mathbf v]_{\mathcal A}$$ basically says that we can arrive at the same result in two different ways. For the left-hand side, we take the result of applying $T$ to the polynomial $\mathbf v$ and compute its coordinates relative to $\mathcal B$, while for the right-hand side, we first compute the coordinates of the polynomial $\mathbf v$ relative to $\mathcal A$ and then multiply that by the matrix that represents $T$ relative to the two bases. To construct this matrix, we apply $T$ to each element of $\mathcal A$ and then compute the coordinates of that polynomial with respect to $\mathcal B$. Expressed in this notation, the $i$th column of $[T]_{\mathcal B\mathcal A}$ is the coordinate tuple $[T\mathbf a_i]_{\mathcal B}$, as is written in the text.


† The points I make could also be made by taking elements of $V$ to be row vectors of reals instead of column vectors, but using polynomials makes it much more obvious that these are a different type of object from their coordinate tuples.


†† The distinction between elements of $\mathbb R^n$ and their coordinate tuples will no doubt come up in some exercises, if it hasn’t already. For instance, consider $V=\{(x,y,z)\in\mathbb R^3 \mid x+y+z=1\}$. This is a two-dimensional subspace of $\mathbb R^3$, so the coordinates of any element of $V$ relative to a basis of $V$ are elements of $\mathbb R^2$. Note, too, that there’s no obvious “standard basis” for this space as there is for $\mathbb R^3$. If $W$ is another two-dimensional subspace of $\mathbb R^3$, the matrix that represents a linear transformation from $V$ to $W$ will be $2\times2$, not $3\times3$.