[Math] Why do we want to have orthogonal bases in decompositions

big-picturefa.functional-analysislinear algebrana.numerical-analysis

In the decompositions I encountered so far, we all had orthogonal set of bases. For example in Singular Value Decomposition, we had orthogonal singular right and left vectors, in [discrete] cosine transform (or [discrete] fourier transform) we had again orthogonal bases.

To describe any vector $x \in \mathbb{C}^N$, we need to have $N$ independent set of basis vectors but independent doesn't necessarily mean orthogonal. My intentions behind selecting orthogonal vectors are as follows:

  • The solution is not unique for $x$ if the basis are not orthogonal.
  • It is easy to find the solution numerically by projecting $x$ onto each vector and this solution doesn't depend on the order of the bases. Otherwise, it depends on the order.
  • If we are talking about some set of vectors, they might be correlated in the original space, but uncorrelated in the transformd space which might be important when analyzing the data, in dimensionality reduction or compression.

I'm trying to understand the big picture. Do you think that I am right with these? Do you have any suggestions, what is the main reason for selecting orthogonal bases?

Best Answer

Your first point, non-uniqueness, is definitely false. One of the basic facts in linear algebra is precisely that for any fixed set of basis vectors (we don't even have to work on a vector space endowed with an inner product, so orthogonality doesn't come in at all), a given vector has a unique decomposition.

For if that were not true, let the basis vectors be $e_1, \ldots, e_n$, then you have two sets of numbers $a_1, \ldots a_n$ and $b_1, \ldots, b_n$ such that

$$ a_1 e_1 + \cdots + a_n e_n = x = b_1 e_1 + \cdots + b_n e_n \implies (a_1 - b_1) e_1 + \cdots + (a_n - b_n) e_n = 0$$

if the sets $a_*$ and $b_*$ are not identical, this implies that $e_1\ldots e_n$ are linearly dependent, contradicting the assumption that they form a basis.

The second point, however, is a biggie. Without an inner product you cannot define an orthgonal projection. Now, generally, this is not too much of a problem. Given the basis vectors $e_1,\ldots, e_n$, finding the coordinates $a_1,\ldots, a_n$ of a given vector $x$ in this basis is just solving a linear system of equations, which actually is not too hard numerically, in finite dimensional systems. In infinite dimensional systems this inversion of the transformation matrix business gets slightly tricky.

The key is to note that without using the orthogonal projection, you cannot answer the question "what is the length of $x$ in the direction of $e_1$?" without knowing the entire set of basis vectors. (Remember that without using the orthogonal projection, you need to solve a linear system of equations to extract the coordinates; if you only specify one of the basis vectors, you do not have a complete system of equations, and the solution is underdetermined. I suspect this is what you had in the back of your mind for the first point.) This is actually a very fundamental fact in geometry, regarding coordinate systems. (I've once heard this described as the "second fundamental mistake in multivariable calculus" made by many students.)

Using the orthogonal projection/the inner product, you can answer the question, as long as you allow only orthgonal completions of the basis starting from the known vector. This fact is immensely useful when dealing with infinite dimensional systems.

I also don't quite like your formulation of the third point. A vector is a vector is a vector. It is independent of the coordinate representation. So I'd expect that if two vectors are correlated (assuming correlation is an intrinsic property), they better stay correlated without regard to choice of bases. What is more reasonable to say is that two vectors maybe uncorrelated in reality, but not obviously so when presented in one particular coordinate system, whereas the lack of correlation is immediately obvious when the vectors are written in another basis. But this observation has rather little to do with orthogonality. It only has some relation to orthogonality if you define "correlation" by some inner product (say, in some presentations of Quantum Mechanics). But then you are just saying that orthogonality of two vectors are not necessarily obvious, except when they are.


My personal philosophy is more one of practicality: the various properties of orthogonal bases won't make solving the problem harder. So unless you are in a situation where those properties don't make solving the problem easier, and some other basis does (like what Ryan Budney described), there's no harm in prefering an orthogonal basis. Furthermore, as Dick Palais observed above, one case where an orthogonal bases really falls out naturally at you is the case of the spectral theorem. The spectral theorem is, in some sense, the correct version of your point 3, that in certain situations, there is a set of basis vectors that are mathematically special. And this set happens to always be orthogonal.


Edit A little more about correlation. This is what I like to tell students when studying linear algebra. A vector is a vector. It is an object, not a bunch of numbers. When I hand you a rectangular box and ask you how tall the box is, the answer depends on which side is "up". This is similar to how you should think of the coordinate values of a vector inside a basis: it is obtained by a bunch of measurements. (Picking which side is "up" and measuring the height in that direction, however, in a non-orthogonal system, will require knowing all the basis vectors. See my earlier point.)

The point is that to quantitatively study science, and to perform numerical analysis, you can only work with numbers, not physical objects. So you have to work with measurements. And in your case, the correlation you are speaking of is correlation between the measurements of (I suppse) different "properties" of some object. And since what and how you measure depends on which basis you choose, the correlation between the data will also depend on which basis you choose. If you pick properties of an object that are correlated, then your data obtained from the measurements will also be correlated. The PCA you speak of is a way to disentangle that. It may be difficult to determine whether two properties of an object is correlated. Maybe the presence of a correlation is what you want to detect. The PCA tells you that, if there were in fact two independent properties of an object, but you just chose a bad set of measurements so that the properties you measure do not "line up" with the independent properties (that the properties you measure have a little bit of each), you can figure it out with a suitable transformation of the data at the end of the day. So you don't need to worry about choosing the best set of properties to measure.

Related Question