[Math] Proper way to find change-of-basis matrix

linear algebralinear-transformationsmatrices

From S.L Linear Algebra:

In each one of the following cases, find $M^{\beta }_{\beta' }(id)$.
The vector space in each case is $\mathbb{R}^3$.

(a)
$\beta = \{(1, 1, 0), (-1, 1, 1), (0, 1, 2)\}$ ; $\beta' = \{(2, 1, 1), (0, 0, 1), (-1, 1, 1)\}$

Definitions:


$\beta$ and $\beta'$:

$\beta$ and $\beta'$ for some linear transformation $F$, define the basis of vector space domain of $F$ and basis of vector space codomain of $F$. In other words, for linear transformation:

$F: V \rightarrow W$

$\beta$ implies a basis of $V$ and $\beta'$ implies a basis of $W$.


$M^{\beta }_{\beta' }(id)$:

Generally, $M^{\beta }_{\beta' }(F)$ for some linear transformation in the book is defined by a unique matrix $A$ having following the property:

If $X$ is the (column) coordinate vector of an element $v$ of $V$,
relative to the basis $\beta$, then $AX$ is the (column) coordinate
vector of $F(v)$, relative to the basis $\beta'$.

I'm not sure about the exact definition of $M^{\beta }_{\beta' }(id)$, but generally book refers to $id$ as an identity map, hence I'm assuming that matrix $M^{\beta }_{\beta' }(id)$ is matrix associated with some identity map.

Solution:


There's a very interesting result in book:

$$X_{\beta{}'}(v) = M^{\beta }_{\beta' }(id) X_{\beta{}}(v)$$ (Equation 1)

Note that $X_{\beta{}}(v)$ implies that the coordinate vector $X$ depends on $v$ and basis $\beta $.

Hence, $v= X \beta{}$ where $v \in V$ and $w = X \beta{}'$ where $w \in W$, assuming that $Id: V \rightarrow W$.

But considering that identity map is both surjective (it's image is equal to codomain) and injective (trivial kernel), I assume we have $V = W$ and hence $Id: V \rightarrow V$.


According to our equation 1 and basis information, I can simply plug variables:

$$\left( x_1(2, 1, 1), x_2(0, 0, 1), x_3(-1, 1, 1) \right)=
\left(A_1x_1(1, 1, 0), A_2x_2(-1, 1, 1), A_3x_3(0, 1, 2) \right)$$

(where $A = M^{\beta }_{\beta' }(id)$ and $A_1, A_2, A_3$ represent columns of $A$, considering that it is a $3 \times 3$ matrix).

Assuming that $x_1, x_2, x_3$ are scalars, we have:

$$\left( (2x_1, x_1, x_1), (0, 0, x_2), (-x_3, x_3, x_3) \right)=
\left(A_1(x_1, x_1, 0), A_2(-x_2, x_2, x_2), A_3(0, x_3, x_3) \right)$$

This is where it gets little confusing, if I try to isolate column vectors of $A$ in this manner:

$$\left( (2x_1 – x_1, x_1 – x_1, x_1 – 0), (0 + x_2, 0 – x_2, x_2 – x_2), (-x_3 – 0, x_3 – x_3, x_3 – x_3) \right)= (A_1, A_2, A_3)$$

Would it be a fundamental error? If not then we would have:

$$\begin{pmatrix}
x_1 & x_2 & -x_3 \\
0 & -x_2 & 0 \\
0 & 0 & 0
\end{pmatrix}$$

Which doesn't seem like a proper solution.


What mistake did I make? Is there a better solution to this problem? I feel like I'm making a simple fundamental mistake.

Best Answer

I find it hard to decipher your notation, but it appears that you’re making a few fundamental errors.

It’s helpful to think of the notation $M_{\beta'}^\beta$ as specifying the “input“ and “output” bases of the matrix $M$: it eats coordinate tuples expressed relative to the ordered basis $\beta$ and spits out coordinate tuples expressed relative to the ordered basis $\beta'$. In particular, applying it to tuples of coordinates expressed in some other basis is nonsensical, as is interpreting its output in terms of some basis other than $\beta'$.

Now, given $\beta=(v_1,v_2,v_3)$, then it’s certainly true that if $X_\beta(v)=(x_1,x_2,x_3)^T$, then $v=x_1v_1+x_2v_2+x_3v_3$, but that’s just the definition of the coordinates of $v$ relative to $\beta$. However, it makes no sense in principle to multiply this sum by the matrix $A=M_{\beta'}^\beta(\operatorname{id})$: $v$ might not even be an element of $\mathbb R^3$ in the first place. In this exercise it is, which I think contributes to your confusion. Even though $v\in\mathbb R^3$, it still makes no sense to multiply it by $A$ because you’re representing the elements of $\beta$ as coordinate vectors relative to the standard basis $\mathcal E$ (or at least some other, unspecified, basis). Using Lang’s notation, that sum gives you $X_{\mathcal E}(v)$, but the product $$M_{\beta'}^\beta(\operatorname{id})X_{\mathcal E}(v)$$ is nonsensical because the bases don’t match.

The next problem is that the same coordinates $(x_1,x_2,x_3)^T$ appear on both sides of the equation that you’ve formed. That’s tantamount to saying that $X_{\beta'}(v)=X_{\beta}(v)$, that is, that the coordinates of an arbitrary vector $v$ are the same in both bases. That’s quite obviously false if $\beta'\ne\beta$. The left-hand side must use the $\beta'$-coordinates of $v$, which are some other three values $(x_1',x_2',x_3')^T$. The number of unknowns is proliferating quickly.

Going back to the definition of $M_{\beta'}^{\beta}$, what we want here is a matrix $A$ such that $$X_{\beta'}(v_i) = AX_{\beta}(v_i)$$ for every element $v_i$ of $\beta$. However, $X_{\beta}(v_i)=e_i$ and $Ae_i=A_i$, from which $A_i=X_{\beta'}(v_i)$, i.e., the columns of $A$ are the elements of $\beta$ expressed relative to the basis $\beta' = (w_1,w_2,w_3)$. For each column, then, you have a system of linear equations. The nine equations can be expressed as the matrix equation $$\pmatrix{X_{\mathcal E}(w_1)&X_{\mathcal E}(w_2)&X_{\mathcal E}(w_3)}A=\pmatrix{X_{\mathcal E}(v_1)&X_{\mathcal E}(v_2)&X_{\mathcal E}(v_3)},$$ therefore $$A = \pmatrix{X_{\mathcal E}(w_1)&X_{\mathcal E}(w_2)&X_{\mathcal E}(w_3)}^{-1}\pmatrix{X_{\mathcal E}(v_1)&X_{\mathcal E}(v_2)&X_{\mathcal E}(v_3)}.$$ Observe, though, that the first matrix in this product is $M_{\beta'}^{\mathcal E}(\operatorname{id})$ and the second is $M_{\mathcal E}^{\beta}(\operatorname{id})$, so we have the useful identity $$M_{\beta'}^\beta(\operatorname{id}) = M_{\beta'}^{\mathcal E}(\operatorname{id})M_{\mathcal E}^{\beta}(\operatorname{id}) = M_{\mathcal E}^{\beta'}(\operatorname{id})^{-1}M_{\mathcal E}^{\beta}(\operatorname{id}).$$ Formally, the upper and lower $\mathcal E$’s in the product “cancel.”

A fairly convenient way to compute this product by hand is to form the augmented matrix $$\left(\begin{array}{c|c}M_{\mathcal E}^{\beta'}(\operatorname{id}) & M_{\mathcal E}^{\beta}(\operatorname{id}) \end{array}\right) = \left(\begin{array}{ccc|ccc}X_{\mathcal E}(w_1)&X_{\mathcal E}(w_2)&X_{\mathcal E}(w_3)&X_{\mathcal E}(v_1)&X_{\mathcal E}(v_2)&X_{\mathcal E}(v_3)\end{array}\right)$$ and apply Gaussian elimination to it to obtain $$\left(\begin{array}{c|c}I_3 & M_{\mathcal E}^{\beta'}(\operatorname{id})^{-1} M_{\mathcal E}^{\beta}(\operatorname{id}) \end{array}\right).$$

Related Question