I am reviewing linear transformations and although I understand most of how they work and what they do, there is one problem that keeps circling in my head. I'm not sure if this is just a pedantic issue of notation, or a conceptual misunderstanding on my part. I am studying from Sergei Treil's Linear Algebra Done Wrong. On page 69 when he's prefacing change of coordinates he reviews the matrix of a linear transformation in the general context of different bases and writes:

Let $T: V \rightarrow W$ be a linear transformation, and let $\mathcal{A} = \{\mathbf{a}_1,\mathbf{a}_2,…,\mathbf{a}_n\}, \mathcal{B} = \{\mathbf{b}_1,\mathbf{b}_2,…,\mathbf{b}_m\}$ be bases in $V$ and $W$ respectively. A matrix $T$ in (or with respect to) the bases $A$ and $B$ is an $m \times n$ matrix, denoted by $[T]_{\mathcal{B}\mathcal{A}}$, which relates the coordinate vectors $[T\mathbf{v}]_{\mathcal{B}}$ and $[\mathbf{v}]_\mathcal{A}$, $$[T\mathbf{v}]_{\mathcal{B}}= [T]_{\mathcal{B}\mathcal{A}}[\mathbf{v}]_\mathcal{A};$$
…The matrix $[T]_{\mathcal{B}\mathcal{A}}$ is easy to find: its $k$th column is jsut the coordinate vector $[T\mathbf{a}_k]_\mathcal{B}$ (compare this to finding the matrix of a linear transformation from $\mathbb{F}^n$ to $\mathbb{F}^m$.

Let me assume that $\mathcal{A}$ and $\mathcal{B}$ are non-standard bases. Clearly, the input vector must be a vector with respect to the basis in the domain. Then when we define how $T$ acts on each basis vector of $\mathcal{A}$ (the domain) in terms of the basis vectors in $\mathcal{B}$ (the codomain), when we multiply a vector written with respect to $\mathcal{A}$ we get the transformed vector written with respect to $\mathcal{B}$, where each coordinate with respect to $\mathcal{A}$ is "transformed" by the proper amount. This makes sense to me in every instance except for when we first see how $T$ acts on the basis vectors in the domain, when we were originally defining it.

When Sergei writes, $[T\mathbf{a}_k]_\mathcal{B}$ (I'm assuming the notation is equivalent to $[T(\mathbf{a}_k)]_\mathcal{B}$), is this notation the same as $[T]_{\mathcal{B}\mathcal{A}}[\mathbf{a}_k]_\mathcal{A}$? If I interpret seeing how $T$ acts on each basis vector as how $T$ acts on each basis vector with respect to that basis it belongs to, you get that $[\mathbf{a}_1]_\mathcal{A} = \mathbf{e}_1$, … ,$[\mathbf{a}_n]_\mathcal{A} = \mathbf{e}_n$. I think of this as viewing how the basis vectors in the domain are analogous to the standard basis with respect to that domain. So I interpret $[T\mathbf{a}_k]_\mathcal{B}$ as really meaning $[T[\mathbf{a}_k]_\mathcal{A}]_\mathcal{B} = [T[\mathbf{e}_k]]_\mathcal{B}$ or $[T(\mathbf{e}_k)]_\mathcal{B}$. This notation makes more sense to me because it seems consistent with the requirement of the input needing to be with respect to the basis in the domain. When I see $T(\mathbf{v})$ without any reference to a basis, I assume that $\mathbf{v}$ is written with respect to $\mathcal{A}$ (the basis in the domain). I.e., that $\mathbf{v}$ is really $[\mathbf{v}]_\mathcal{A}$. I assume this because this is how the transformation is defined. Then following this assumption, when you write $T(\mathbf{a}_k)$, I would assume $\mathbf{a}_k = [\mathbf{a}_k]_\mathcal{A}$, which is the same thing as $T([\mathbf{a}_k]_\mathcal{A}) = T(\mathbf{e}_k)$. Are these the correct assumptions? The issue is then that $\mathbf{a}_k = \mathbf{e}_k$, which is not generally the case with nonstandard bases.

Following the notation I propose, we define how $T$ acts on the analagous "standard basis" with respect to the basis in the domain, and write its transformation in terms of the basis vectors in the codomain:

$$ [T[\mathbf{e}_k]]_\mathcal{B} = \alpha_1\mathbf{b}_1 + \alpha_2\mathbf{b}_2 + … + \alpha_m\mathbf{b}_m = \sum_{i=1}^m \alpha_{ij}\mathbf{b}_i. $$

Then, the coefficients $(\alpha_1,\alpha_2,…,\alpha_m)^T$ represent the coordinates of $T(\mathbf{e}_k)$ with respect to $\mathcal{B}$ as we wanted for a specific input coordinate with respect to $\mathcal{A}$. And the $\alpha$ coefficients represent a column of our transformation matrix. Taking $1 \leq k \leq n$ completes every column of our matrix.

So to conclude, I want to make sure I'm making the correct inferences/assumptions about the notation. Otherwise I'm conceptually misunderstanding something. If you read this far, thank you for taking the time.

Your confusion appears to be coming from an assumption that $V=\mathbb F^n$, that is, that vectors are $n$-tuples of scalars. The notation might make more sense to you if you choose some other set of objects as your vectors, such as polynomials of degree at most $n$ with real coefficients†, so that the important distinction between vectors and their coordinates is more apparent: the vector $\mathbf v$ is then a polynomial, while its coordinate tuple with respect to some ordered basis $\mathcal B$, denoted by $[\mathbf v]_{\mathcal B}$, is a $n$-tuple of real numbers. This notation highlights and maintains the difference between a vector and its coordinate tuple, even when the vectors are themselves tuples of scalars.††

The application of the linear transformation $T:V\to W$ to $\mathbf v\in V$ is denoted by $T\mathbf v$—it’s common in algebra to use simple juxtaposition and omit the brackets that you’re no doubt used to. Let’s again take $V$ and $W$ to be vector spaces of polynomials. A critical thing to note is that $T$ operates on polynomials and produces polynomials: writing $T[\mathbf v]_{\mathcal B}$ is nonsensical since that means that you’re trying to apply $T$ to an $n$-tuple of real numbers instead. On the other hand, writing $[T]_{\mathcal B\mathcal A}[\mathbf v]_{\mathcal A}$ does make sense. Here, the juxtaposition represents matrix multiplication instead of function application, which is probably another source of confusion. We left-multiply the column vector $[\mathbf v]_{\mathcal A}$ by the matrix $[T]_{\mathcal B\mathcal A}$ to obtain another column vector, which happily is equal to $[T\mathbf v]_{\mathcal B}$, i.e., the coordinate tuple of the polynomial $T\mathbf v$ with respect to $\mathcal B$.

The identity $$[T\mathbf v]_{\mathcal B} = [T]_{\mathcal B\mathcal A}[\mathbf v]_{\mathcal A}$$ basically says that we can arrive at the same result in two different ways. For the left-hand side, we take the result of applying $T$ to the polynomial $\mathbf v$ and compute its coordinates relative to $\mathcal B$, while for the right-hand side, we first compute the coordinates of the polynomial $\mathbf v$ relative to $\mathcal A$ and then multiply that by the matrix that represents $T$ relative to the two bases. To construct this matrix, we apply $T$ to each element of $\mathcal A$ and then compute the coordinates of that polynomial with respect to $\mathcal B$. Expressed in this notation, the $i$th column of $[T]_{\mathcal B\mathcal A}$ is the coordinate tuple $[T\mathbf a_i]_{\mathcal B}$, as is written in the text.

† The points I make could also be made by taking elements of $V$ to be row vectors of reals instead of column vectors, but using polynomials makes it much more obvious that these are a different type of object from their coordinate tuples.

†† The distinction between elements of $\mathbb R^n$ and their coordinate tuples will no doubt come up in some exercises, if it hasn’t already. For instance, consider $V=\{(x,y,z)\in\mathbb R^3 \mid x+y+z=1\}$. This is a two-dimensional subspace of $\mathbb R^3$, so the coordinates of any element of $V$ relative to a basis of $V$ are elements of $\mathbb R^2$. Note, too, that there’s no obvious “standard basis” for this space as there is for $\mathbb R^3$. If $W$ is another two-dimensional subspace of $\mathbb R^3$, the matrix that represents a linear transformation from $V$ to $W$ will be $2\times2$, not $3\times3$.