Inner product and orthogonality in non-orthonormal basis

change-of-basisinner-productsorthonormalvectors

Suppose that $V$ is an inner product space and that $\mathbb{K}$ is a field. The inner product is a map $\langle \cdot,\cdot \rangle : V \times V \to \mathbb{K}$. In the Euclidean space $\mathbb{R}^n$ the inner product is the dot product and is defined as

$$\langle \mathbf{u},\mathbf{v} \rangle = \left\langle \begin{pmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{pmatrix},\begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \right\rangle = u_1v_1 + u_2v_2 + \ldots + u_nv_n$$

and equal to the geometric definition $\langle \mathbf{u},\mathbf{v} \rangle = \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert \cos(\theta)$ where $\theta$ is the angle between $\mathbf{u},\mathbf{v}$. The cosine of the angle is then defined using the Cauchy-Schwarz inequality $\lvert \langle \mathbf{u},\mathbf{v} \rangle \rvert \leqslant \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert$ since we have

$$\frac{\lvert \langle \mathbf{u},\mathbf{v} \rangle \rvert}{\lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert} \leqslant 1$$

the cosine of the angle is then defined as

$$\cos(\theta) = \frac{\lvert \langle \mathbf{u},\mathbf{v} \rangle \rvert}{\lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert}$$

Then $\mathbf{u},\mathbf{v} \in V$ are said to be orthogonal if and only if $\langle \mathbf{u},\mathbf{v} \rangle$. We write $\mathbf{u} \perp \mathbf{v}$ to mean that $\mathbf{u},\mathbf{v}$ are orthogonal.

Now let two vectors $\mathbf{x},\mathbf{y}$ represented in the standard basis $\{\hat{\mathbf{e}}_1,\hat{\mathbf{e}}_2,\ldots,\hat{\mathbf{e}}_n\}$ with coordinates $\mathbf{x} = \sum a_i\hat{\mathbf{e}}_i$ and $\mathbf{y} = \sum b_i\hat{\mathbf{e}}_i$, with $i$ from $0$ to $n$. The the inner product is given by

$$\langle \mathbf{x},\mathbf{y} \rangle = \left\langle \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix},\begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix} \right\rangle = \sum_{i = 0}^n a_ib_i$$

This definition is independent from the choice of the basis within $\mathbb{R}^n$ and it follows that in a non-orthonormal basis you could have two vectors that appears pairwise perpendicular but with an inner product, with coordinates in respect to this basis, different from zero.

However, if the inner product between two vectors is zero, whatever the basis is, then the two vectors appear pairwise perpendicular with respect to any orthonormal basis within $\mathbb{R}^n$ and they are considered orthogonal because their inner product is zero.

Two vectors $\mathbf{x},\mathbf{y}$ are orthogonal if and only if they appear pairwise perpendicular with respect to an orthonormal basis. The immediate consequence is that their inner product is zero and we write $\langle \mathbf{x},\mathbf{y} \rangle = 0$.

When defining orthogonality, it looks like the definition is backward, we define orthogonality as the consequence of the zero inner product but ignoring the pairwise perpendicular geometric aspect which is fundamentally the picture we all have for orthogonality. What motivates this abstract definition?

Best Answer

I think you may be putting too much constraint on what the inner product looks like. While the Euclidean dot product can be computed as $\langle x,y\rangle = x^Ty$, inner products in general look like $\langle x,y\rangle = x^TAy$ where $A$ is symmetric positive-definite. If you impose the Euclidean dot product in one basis, and then change the basis, two vectors are still orthogonal with respect to that inner product but the matrix multiplication itself changes. In other words, the dot product is the special case where $A=I$, but changing the basis changes this matrix.

I’d say that the motivation for defining orthogonality in terms of the inner product is to uncover the algebraic/computational ramifications of the geometric picture. As far as computability, it’s very difficult to verify if two vectors are orthogonal via a picture (i.e. how do you know the picture is exact?) whereas the artifact of a zero inner product is quite easy to verify. It’s often a general goal in all areas of mathematics to abstract the computational device that gives rise to a qualitative feature.

The inner product definition of orthogonality also generalizes to other areas of mathematics. For instance in analysis a common inner product that’s taken between two functions defined on the interval $[0,1]$ is

$$ \langle f,g\rangle = \int_0^1 f(x)g(x)dx. $$

This gives us a way to adapt linear algebraic concepts to functions in a way that the geometric picture does not. Most of the time functions that are orthogonal with respect to this inner product don’t “look” perpendicular in the same way that vectors do.