Explicit calculation of condition number of a square matrix

condition numberlinear algebramatricesnormed-spacesrobotics

According to Wikipedia, the condition number is defined as follows:

Assume the linear system of equation $$Ax = b.$$
Let $e$ be the error in b, then the error of the solution $A^{-1}b$ is $A^{-1}e$. The ration between the relative error of the solution to the relative error of the right hand side is $$\frac{\frac{\left|\left|A^{-1}e\right|\right|}{\left|\left|A^{-1}b\right|\right|}}{\frac{\left|\left|e\right|\right|}{\left|\left|b\right|\right|}} = \frac{\left|\left|A^{-1}e\right|\right|}{\left|\left|e\right|\right|} \frac{\left|\left|b\right|\right|}{\left|\left|A^{-1}b\right|\right|}\text{ , where}$$

$\left|\left|\cdot \right|\right|$ presumably denotes an arbitrary norm.

We know want to know what the maximum value of this error could be, and define this as the condition number $\kappa(A)$

\begin{align}\kappa(A) &= \max_{e,b\ne 0}\left\{\frac{\left|\left|A^{-1}e\right|\right|}{\left|\left|e\right|\right|} \frac{\left|\left|b\right|\right|}{\left|\left|A^{-1}b\right|\right|} \right\} \\ &= \max_{e\ne 0}\left\{\frac{\left|\left|A^{-1}e\right|\right|}{\left|\left|e\right|\right|}\right\}\max_{b\ne 0}\left\{ \frac{\left|\left|b\right|\right|}{\left|\left|A^{-1}b\right|\right|} \right\} \\
&= \max_{e\ne 0}\left\{\frac{\left|\left|A^{-1}e\right|\right|}{\left|\left|e\right|\right|}\right\}\max_{x\ne 0}\left\{ \frac{\left|\left|Ax\right|\right|}{\left|\left|x\right|\right|} \right\} \\
&= \left|\left| A^{-1}\right|\right|\left|\left|A^{\vphantom{-1}} \right|\right|.\end{align}

This last step follows directly from the definition of an operator norm
$$\left|\left|A^{\vphantom{-1}} \right|\right| = \max_{x\ne 0}\left\{\frac{\left|\left|Ax\right|\right|}{\left|\left|x\right|\right|}\right\}.$$

So far so good, every step for this derivation makes logical sense. However, how does one go about actually calculating the operator norm $\left|\left|A\right|\right|$?

The frobenius norm
$$\left|\left| A \right|\right|_F =\sqrt{\sum_{i=1}^m\sum_{j=1}^n\left|a_{ij}\right|^2} = \sqrt{\text{tr}\left(A^TA\right)}$$
is not an operator norm, and using it would calculate an upper bound for $\kappa(A)$ instead of the value itself
$$\kappa(A)\le \left|\left| A^{-1}\right|\right|_F\left|\left|A^{\vphantom{-1}} \right|\right|_F.$$
Nonetheless, this definition is often utilized in the field of robotics for the calculation of $\kappa(A)$. See for example this excerpt of C. Gosselin, J.Angeles:

enter image description here

It is even mentioned that the (normalized) frobenius norm is equivalent to the operator norm!

What am I missing here? Is this some sort of special case? Any help would be greatly appreciated, thanks in advance!

Best Answer

Not really an answer, but too long for a comment. Really a collection of comments...

First, there are many ways of defining condition numbers. All boil down to computing or estimating the worst case sensitivity of a solution to a perturbation in data. A good reference is "Matrix Compuations" by Golub & Van Loan, Section 2.5, "The Sensitivity of Square Linear Systems".

As you noted above, if $Ax=b,Ax'=b+h$, then ${\|x'-x\| \over \|x\|} \le \kappa(A) {\| h\| \over \|b\| }$, and this relationship holds for a norm and its corresponding operator norm.

The point is to have some estimate of the sensitivity of relative errors.

For a given norm $\|\cdot\|_*$ a standard way of defining the $*$-condition number is $\kappa_*(A) = \|A\|_* \|A^{-1}\|_*$. This definition can be used with any norm but the direct relevance to a particular problem (such as above) usually requires that the norms be consistent (that is the matrix norm is the induced norm).

On a finite dimensional space (matrices here) any two norms are equivalent, that is for two norms $\|\cdot\|_a, \|\cdot \|_b$ there are constants $a,b>0$ such that $a \|A\|_a \le \|A\|_b \| \le b \|A\|_a$, hence we have $a^2\kappa_a(A) \le \kappa_b(A) \le b^2 \kappa_a(A)$, so, the condition numbers are also equivalent in this sense.

Just for the record, we have $\|A\|_2 \le \|A\|_F \le \sqrt{n}\|A\|_2$, where $\|\cdot\|_2$ is the spectral norm. Note that $\|A\|_2 = \sqrt{\lambda_\max(A^TA)}$.

Note that the norms $\|A\|_s = \sqrt{\operatorname{tr}(A^TWA)}$ and $\|A\|_2 = \max_{\|x\|_2 \le 1} \|Ax\|_2 = \sqrt{\lambda_\max(A^TA)}$ are equivalent but they are not equal (I'm not sure what frame invariance means, I presume it means rotation invariant, but that is irrelevant here).

So, presumably you are wondering why not just deal with the spectral norm (norm induced by the Euclidean distance)?

The issue is that computing the condition number with the spectral norm is relatively expensive (in addition to computing the inverse), so a cheaper alternative is used if reasonable. Generally it is not required to compute the condition exactly, so a cheaper approximation suffices. (For example, I have seen code where the condition was estimated by computing $\max_k(\|Ax_k\|)$ for a small number of vectors.)

If one is performing computations and the luxury of a condition estimate is available then it gives some 'quantitative' measure of how much relative errors in the input data are amplified and hence a degree of confidence in the results.

And as a complete aside, this illustrates why operations with orthogonal matrices are stable numerically, as we have $\kappa(Q) = 1$ in this case. (Again with the spectral norm.)