[Math] Relation between condition and linear dependence of column vectors

linear algebramatricesvector-spaces

There are several interpretations of the condition number of a matrix: Relation between smallest and largest singular values, amount of error amplification etc.

In my opinion, another interpretation should be how far the column vectors (i.e., the image/range space) diverge from being perfectly orthogonal. Perfectly orthogonal column vectors would mean condition number of 1 and linear dependent ones in very high condition numbers.

Is this interpretation correct?

Is there a similar interpretation for the row vectors?

In any case, I tried to verify this idea with simple 2×2 matrices and in most cases, it is indeed true (the closer the normalized dot product between the column vectors to 0 (i.e., 90 degrees), the lower the condition number). But not always.

For example:

$$
A_1 = \begin{pmatrix}
0 & 4 \\ 15 & -2
\end{pmatrix} ,\quad
A_2 = \begin{pmatrix}
15 & -2 \\ -8 & 11
\end{pmatrix}
$$

$\mathrm{cond}(A_1) = 3.82$, dot product of column vectors is -0.44, dot product of row vectors -0.13. But $\mathrm{cond}(A_2) = 2.35$, dot product of column vectors is -0.62, dot product of row vectors -0.69.

So $A_2$ has the lower condition number but the vectors seem to be less "orthogonal" than $A_1$.

Why is this the case?

Best Answer

It depends how you measure the orthogonality. Normally, the orthogonality is measured in terms of some "loss of orthogonality", that is, some quantity penalizing the fact that $A$ is not orthogonal (= having orthonormal columns).

One such measure could be to take a norm of $Q-A$, where $Q$ is the orthogonal matrix closest to $A$ (in some norm) such that the column space of $Q$ and $A$ are identical. Assume that $A=USV^T$ is the SVD of $A$. For the 2-norm, it can be shown that such a matrix is given by $Q=UV^T$, which is actually the orthogonal factor of the polar decomposition of $A$. To make some condition number appear there, assume that $\|A\|_2=1$ (if we would like to measure the orthogonality of $A$, we could at least scale it to unit norm). Then (using the fact that the 2-norm is orthogonally invariant and the 2-norm of a diagonal matrix is equal to the maximal diagonal entry in the absolute value) we have $$ \frac{\|Q-A\|_2}{\|A\|_2}=\frac{\|UV^T(I-VSV^T)\|_2}{\|S\|_2}=\frac{\|I-S\|_2}{\|S\|_2}=1-\mathrm{cond}(A). $$ If $A$ was square, we would get the same quantity for rows of $A$, although, if $A$ was not of full rank, the matrix $Q$ (with the properties mentioned above) would be generally different for rows and columns (but $Q=I$ in the full rank case).

Another useful meaning of $\mathrm{cond}(A)$ is the relative reciprocal distance to the nearest matrix of lower rank, that is, $$ \frac{1}{\mathrm{cond}(A)}=\min_B\left\{\frac{\|B-A\|_2}{\|A\|_2}:\;\mathrm{rank}(B)<\mathrm{rank}(A)\right\}. $$ This is closely related to the low-rank approximations.

Related Question