Relating “change of coordinates” to change of basis – how to find change in representations of vectors

linear algebra

I've been studying about change of basis in $\mathbb{R}^2$ (could be $\mathbb{R}^n$ but sticking to $\mathbb{R}^2$ for simplicity) – how it affects representations of vectors, metrics and endomorphisms. Let's say I start with a basis $\mathcal{B}=\{\vec u_1, \vec u_2\}$, and want to switch to a different basis $\mathcal{A}=\{\vec v_1,\vec v_2\}$. That is, if earlier we were expressing the components of some vector $\vec w$ in the $\mathcal{B}$ basis, we now want to express its components in the $\mathcal{A}$ basis).

For this I can use a change of basis matrix $M_{\mathcal{A}\leftarrow\mathcal{B}}$ whose columns are the representations of $\vec u_1,\vec u_2$ in the $\mathcal{A}$ basis. And then I can relate the representations of $\vec w$ in the two bases by:
$$[\vec w]_{\mathcal{A}} = M_{\mathcal{A}\leftarrow\mathcal{B}}[\vec w]_{\mathcal{B}}$$

If I have a linear transformation $T$ of the vector space to itself (endomorphism), and if I know its representation in the old basis, then I can get its representation in the new basis like this (let's say $T$ maps $\vec w_1$ to $\vec w_2$ and the change of basis matrix is invertible):
$$[\vec w_2]_{\mathcal{A}}=[T]_{\mathcal{A}}[\vec w_1]_{\mathcal{A}}
\\\implies M_{\mathcal{B}\leftarrow\mathcal{A}}[\vec w_2]_{\mathcal{A}}=M_{\mathcal{B}\leftarrow\mathcal{A}}[T]_{\mathcal{A}}M_{\mathcal{A}\leftarrow\mathcal{B}}M_{\mathcal{B}\leftarrow\mathcal{A}}[\vec w_1]_{\mathcal{A}}
\\\implies [\vec w_2]_{\mathcal{B}}=(M_{\mathcal{B}\leftarrow\mathcal{A}}[T]_{\mathcal{A}}M_{\mathcal{A}\leftarrow\mathcal{B}})[\vec w_1]_{\mathcal{B}}
\\\implies [T]_{\mathcal{B}} = M_{\mathcal{B}\leftarrow\mathcal{A}}[T]_{\mathcal{A}}M_{\mathcal{A}\leftarrow\mathcal{B}}$$

Finally, if $\eta$ is the metric, then invariance of inner product gives us:
$$[\vec w_1]^T_{\mathcal{A}}[\eta]_{\mathcal{A}}[\vec w_2]_{\mathcal{A}}=
[\vec w_1]^T_{\mathcal{B}}[\eta]_{\mathcal{B}}[\vec w_2]_{\mathcal{B}}
\\=[\vec w_1]^T_{\mathcal{A}}M^T_{\mathcal{B}\leftarrow\mathcal{A}}[\eta]_{\mathcal{B}}M_{\mathcal{B}\leftarrow\mathcal{A}}[\vec w_2]_{\mathcal{A}}
\\\implies [\eta]_{\mathcal{A}}=M^T_{\mathcal{B}\leftarrow\mathcal{A}}[\eta]_{\mathcal{B}}M_{\mathcal{B}\leftarrow\mathcal{A}}$$

So far so good. I can use the above to find representations in the new coordinate system in the case of some simple coordinate system changes – e.g. if I shift the coordinate system in some direction or if I rotate it by some angle $\phi$.

But I'm at a loss on how to extend this same formalism (of finding representations in the new coordinate system), if we change from Cartesian to polar coordinates. If I try to form a change of basis matrix (assuming that the new system is $(r,\phi)$), I get $[1,0]^T$ and $[1,\pi/2]^T$ as the columns of my CoB matrix, which gives wrong results when I try to use it to get components of a vector in the polar coordinate system.

Next thing I thought was, am I fundamentally confusing a change of coordinates with a change of basis? For that, I tested the above procedure of finding vector component transformation in case of rescaling/rotating only one of the axes – even in that scenario, the above procedure works. This leads me to suspect that the above formulas for transformation of components between bases hold in general for any rectilinear coordinate systems – whether orthogonal or not.

What do I do in case of a Cartesian to polar coordinate system change to find representations of vectors, metric and linear transformations?

Best Answer

Matrices are useful representations of linear maps from one vector space to another (or the same one). But the transformation $\Psi : \mathbb{R}^2 \to \mathbb{R}^2$ from rectangular to polar coordinates, given by $$ \newcommand{cif}{\mathrm{if}\ } \newcommand{cand}{\ \mathrm{and}\ } $$

$$ \Psi(x,y) = \left(\sqrt{x^2+y^2}, \Theta(x,y)\right) $$

$$ \Theta(x,y) = \begin{cases} 0 & \cif x=0 \cand y=0 \\ \arctan \frac{y}{x} & \cif x>0 \cand y>0 \\ \frac{\pi}{2} & \cif x=0 \cand y>0 \\ \pi + \arctan \frac{y}{x} & \cif x<0 \\ \frac{3 \pi}{2} & \cif x=0 \cand y<0 \\ 2\pi + \arctan \frac{y}{x} & \cif x>0 \cand y<0 \\ \end{cases} $$

is not represented by a matrix because it is a non-linear transformation. Also, polar coordinates aren't a vector space: $(c r, c \theta)$ does not have a simple relationship to $(r,\theta)$, and $(r_1+r_2, \theta_1+\theta_2)$ does not have a simple relationship to $(r_1, \theta_1)$ and $(r_2, \theta_2)$. The image of $\Psi$ isn't even all of $\mathbb{R}^2$.

So most things wanting vector properties will just need to go back to the original coordinates, and in general there's no guarantee there will be a "nice" way to write them in a new coordinate system.

If $T$ is a linear transformation on $\mathbb{R}^2$ (rectangular), then its action on polar coordinates is:

$$ [T]_\Psi = \Psi \circ T \circ \Psi^{-1} $$

And we know the inverse $\Psi^{-1}$:

$$ \Psi^{-1}(r,\theta) = (r \cos \theta, r \sin \theta) $$

If we write

$$ T = \left(\begin{array}{cc} a & b \\ c & d \end{array}\right) $$

we get

$$ T\Psi^{-1}(r,\theta) = (ar\cos\theta + br\sin\theta, cr\cos\theta + dr\sin\theta) $$

So $\Psi T \Psi^{-1}(r,\theta)$ can be simplified a little bit, but isn't particularly pretty.

In a vector space, a norm acts as a distance function between vectors, $d(u,v) = \|u-v\|$, and obeys the triangle inequality $\|u+v\| \leq \|u\| + \|v\|$. This doesn't make as much sense in polar coordinates, where even adding or subtracting two points isn't exactly defined in the first place unless by going back to rectangular. But of course, if we just want to find the rectangular norm of a polar point, that's easy: $\|\Psi^{-1}(r,\theta)\| = r$. For other non-linear maps, it might not be so simple.

Polar coordinates again don't really have an inner product, since its properties related to multiplying by scalars and adding don't make direct sense. But the original rectangular inner product can be found as

$$ \begin{align*} \left< \Psi^{-1}(r_1,\theta_1), \Psi^{-1}(r_2,\theta_2) \right> &= \left< (r_1 \cos \theta_1, r_1 \sin \theta_1), (r_2 \cos \theta_2, r_2 \sin \theta_2) \right> \\ &= r_1 r_2 (\cos \theta_1 \cos \theta_2 + \sin \theta_1 \sin \theta_2) \\ &= r_1 r_2 \cos(\theta_1 - \theta_2) \end{align*} $$

which makes sense from knowing the dot product of two vectors in $\mathbb{R}^n$ is the product of their norms times the cosine of the angle between them.

There is one useful related generalization. For a function $f : \mathbb{R}^m \to \mathbb{R}^n$, we can define the derivative (or Jacobian matrix) $D_f$ as a function onto matrices, $D_f : \mathbb{R}^m \to \mathbb{R}_{n \times m}$, where the elements are the partial derivatives $\partial f_j(x_1,\ldots x_m)/\partial x_i$, because it has the property

$$ \forall x \in \mathbb{R^m}, u \in \mathbb{R^n}, v \in \mathbb{R^m} : \lim_{h \to 0} \frac{\langle u, f(x+hv) - f(x) \rangle}{h}\ = \langle u, D_f(x) v \rangle $$

It also follows a chain rule

$$ D_{f \circ g}(x) = D_f(g(x)) D_g(x) $$

For a linear map $T$, the derivative $D_T$ is a constant function whose value everywhere is the same matrix which ordinarily represents $T$. But in general, $D_f$ is different matrices at different points.

This derivative $D_f$ is important in seeing the effects of a change of coordinates on a multiple integral: Given a domain $S \subseteq \mathbb{R}^n$, an injective differentiable map $A : S \to \mathbb{R}^n$ and a real-valued function $f : A(S) \to \mathbb{R}$,

$$ \int_{y \in A(S)} f(y)\, dy = \int_{x \in S} f(A(x))\, \big| \det(D_A(x)) \big| \, dx $$

So for a linear map $T$,

$$ \int_{y \in T(S)} f(y)\, dy = |\det T| \int_{x \in S} f(T x)\, dx $$

For the map $\Psi^{-1}$ from polar coordinates to rectangular, we get

$$ D_{\Psi^{-1}}(r, \theta) = \left(\begin{array}{cc} \cos \theta & \sin \theta \\ -r \sin \theta & r \cos \theta \end{array}\right) $$

$$ \Big| \det \!\big( D_{\Psi^{-1}}(r, \theta) \big)\Big| = r \cos^2 \theta + r \sin^2 \theta = r $$

giving the familiar

$$ \int_{(x,y) \in S} f(x,y)\, dx\, dy = \int_{(r,\theta) \in \Psi(S)} f(r \cos\theta, r \sin\theta)\, r\, dr\, d\theta $$