Why we define an orthogonal matrix $A$ to be one that $A^TA=I$

linear algebra

Why we define an orthogonal matrix $A$ to be one that $A^TA=I$ rather than just let its rows and columns be orthonormal ?

Note: I ask this because I think that it's more natural to first defined $A$ with orthonormal columns and rows, then prove $A^TA=I$.

Best Answer

The definitions you mention are actually equivalent and it's quite easy to see why. Let $A = [a_1 \, a_2 \, \cdots \, a_n]$. Observe that the columns of $A$ being orthonormal is equivalent to

$$a_i \cdot a_j = \delta_{ij},$$

where $\delta_{ij}$ is the Kronecker symbol. Now consider the matrix product

$$A^TA = \begin{bmatrix} a_1^T \\ a_2^T \\ \vdots \\ a_n^T \end{bmatrix}[a_1 \, a_2 \, \cdots \, a_n],$$

whose $(i,j)$-entry is exactly the scalar product $a_i \cdot a_j$. Do you now see how these definitions are equivalent?

Addendum/edit: Now, this does not exactly answer the question as to why we often prefer one definition over the other. The answer is that it is more compact and more useful when doing computation. Definition this kind, i.e. that can be expressed (perhaps more intuitively) in words are defined in a symbolic and more compact way, to ease computation and shorten proofs. Here is another example: We can define a stochastic matrix as a matrix whose entries are non-negative and where the sum over each column is 1. However, this is wordy and seems cumbersome to check. We can equivalently define it as follows:

Let $S = [1 \, 1 \, \cdots \, 1]$ be the $1 \times n$ matrix filled with ones. An $n \times n$ matrix $A$ is stochastic if all its entries are non-negative and it satisfies

$$SA = S.$$

This is more compact and versatile than our first definition.