I am having trouble understand the use of homogeneous coordinates for when describing transformations in 3D space. From what I have seen, the only difference between a transformation matrix in standard coordinates, and homogeneous coordinates, is that a fourth row is added, of [0 0 0 1]. Then, when transforming a point, an additional row of [1] is added to the point vector. What is the point of this additional 1? And is it ever a different number? From what I have read, homogeneous coordinates enable perspective transformations to be achieved using matrices and linear algebra, but I don't see the connection….
[Math] Why use homogeneous coordinates
linear algebramatrices
Related Solutions
Sometime one has to left-multiply, sometimes one has to right-multiply. This really depends.
Prerequisites:
You are performing scaling, rotation, and translation. So let us assume we have linear point transformation of the general form:
$$\mathtt T = \left[ \begin{array}{cc} s\mathtt R & \mathbf t \\ \mathtt O& 1\end{array} \right]$$ which first rotates a point by $\mathtt R$ , then scales it by $s$ and then adds the translation $\mathbf t$:
$$\mathtt T \cdot \left( \begin{array}{c} \mathbf x \\ 1\end{array} \right) = \left[ \begin{array}{c} s(\mathtt R\cdot \mathbf x) + \mathbf t \\ 1\end{array} \right]$$
(Note that rotation and scaling commutes: $s(\mathtt R\cdot \mathbf x)=\mathtt R(s\cdot \mathbf x)$)
From now on we will assume that all points $\mathbf y$ are homogenous points ($\mathbf y= (\mathbf x, 1)^\top $).
Mind the reference frames: In order to make it clear whether you need a left or right multiplication, it is important to highlight in which reference frame your points are!
Let us assume, we have points $\mathbf y_a$ in reference frame $a$, and you want to transform them into reference frame $b$, you do
$$ \mathbf y_b = \mathtt T_{ba} \mathbf y_a$$ where $\mathtt T_{ba}$ is a transformation to $b$ from $a$. Note that the indices must match!
Now, let us look at a more complicated example. One might be interested in:
$$\mathbf y_a = \mathtt T_{ab}\mathtt T_{bc}\mathtt T_{cd}\mathbf y_d$$
Further, let's assume that we receive the poses in order (First $\mathtt T_{ab}$, then $\mathtt T_{bc}$...).
We would calculate in an algorithm:
$\mathtt T_{ai} := \mathtt T_{ab}$
(thus, $i=b$)
$\mathtt T_{ai} := \mathtt T_{ai}\cdot \mathtt T_{bc}$
(now, $i=c$)
$\mathtt T_{ai} := \mathtt T_{ai}\cdot \mathtt T_{cd}$
($i=d$)
Thus, we right-multiplied and $\mathtt T_{ai}$ means now $\mathtt T_{ad}$, the transformation from $d$ to $a$. Finally, we can transform our points:
$$\mathbf y_a := \mathtt T_{ad} \mathbf y_d $$
However, if one really wants to left-multiply, this is possible too! Note that $\mathtt T_{ia}=\mathtt T_{ai}^{-1}.$ Thus, we can do:
$\mathtt T_{ia} := \mathtt T_{ab}^{-1}$
($i=b$)
$\mathtt T_{ia} := \mathtt T_{bc}^{-1} \mathtt T_{ia}$
($i=c$)
$\mathtt T_{ia} := \mathtt T_{cd}^{-1}\mathtt T_{ia}$
($i=d$)
Thus, we have $\mathtt T_{ia} = \mathtt T_{da}$, and therefore we can transfrom the point from $d$ to $a$ using the inverse:
$$\mathbf y_a := \mathtt T_{da}^{-1} \mathbf y_d $$
You cannot represent a translation in $3$-dimensional space with a matrix smaller than $4\times 4$. All $3\times 3$ matrices represent transformations which leave the origin fixed, because multiplying a matrix by the zero vector always yields the zero vector. Homogeneous coordinates might be used as a trick to solve this, but you need to add a vector entry and so switch to a $4\times 4$ matrix.
ADDED LATER (taken from comments)
In comments, the OP asked for a coordinate-free explanation. This is done by distinguishing "linear" vs "affine" or, more generally, "projective" transformations. Given a vector space $V$, one can build its projective completion $\mathbb P(V\oplus \mathbb K)=\left[V\oplus \mathbb K \setminus\{(0,0)\}\right]/ \sim$, where $(v, \lambda)\sim(w, \mu)$ if and only if $ A(v, \lambda)=B(w, \mu)$ for scalars $A, B\in\mathbb K$. The space $V$ naturally embeds into its projective completion via the map $v\mapsto (v, 1)$. (This is the coordinate-free realization of the introduction of homogeneous coordinates on $\mathbb K^n$).
This completion allows one to algebraically represent more transformations of $V$ than just the linear ones. The keyword here is projective transformation. Translations are realized as projective transformation via the following trick: \begin{equation} \begin{array}{ccc} T_h(v)=v+h & \leftrightarrow&\begin{bmatrix} I_V & h \\ 0 & 1\end{bmatrix} \begin{bmatrix} v \\ 1\end{bmatrix} = \begin{bmatrix} v+h \\ 1\end{bmatrix} \end{array} \end{equation} The relevance of this to the question is that the translation is realized as a transformation of the projective completion of $V$. This needs one more coordinate to be described as a matrix, hence (if $\dim V=3$) a $4\times 4$ matrix necessarily.
Best Answer
The main reason to extend $\mathbb R^3$ to $\mathbb {RP}^3$ by identifying $(x,y,z)$ with $(x:y:z:1)$ is to allow affine transformations be described by matrix multiplication, like you do for linear transformations. An affine transformation $\mathbb R^3\to\mathbb R^3$ is usually given by $x\mapsto Ax+b$ where $A$ is a $3\times 3$ matrix and $b\in\mathbb R^3$ a column vector. To be precise, $$ \pmatrix{x_1\\x_2\\x_3} \longmapsto \pmatrix{ a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33}} \pmatrix{x_2\\x_2\\x_3}+ \pmatrix{b_1\\b_2\\b_3}. $$ Using the extension to $\mathbb{RP}^3$ we may write this as $$ \left[\pmatrix{x_1\\x_2\\x_3\\x_4}\right] \longmapsto \left[\pmatrix{ a_{11} & a_{12} & a_{13} & b_1\\ a_{21} & a_{22} & a_{23} & b_2\\ a_{31} & a_{32} & a_{33} & b_3\\ 0&0&0&1} \pmatrix{x_2\\x_2\\x_3\\x_4}\right], $$ where $[\cdot]$ denotes the equivalence class with respect to $x\sim\lambda x$ for any $\lambda\neq 0$. Notice that plugging $(x:y:z:1) = [(x,y,z,1)]$ into this gives you exactly the same result as the original affine transformation.
Now why would we like affine transformations to be expressible this way? Composing linear maps is easy: Multiply the representing matrices. So by going from $\mathbb R^3$ to $\mathbb{RP}^3$ we can compose even affine maps by just multiplying the representing matrices.
Another benefit, as mentioned by Gregory Grant, is that we now can do computations with points at infinity. For example putting a light source at $(x:y:z:0)$ for $(x,y,z)\neq 0$, allows us to describe parallel light in the direction $(x,y,z)$ in the same way as we describe radial light from some source located at a point in $\mathbb R^3$.