Numerical Linear Algebra – Understanding the QR Eigenvalue Finding Algorithm

algorithmseigenvalues-eigenvectorsmatricesnumerical linear algebra

I'm trying to code up a matrix library (purely as a learning exercise). This question is about the math I'm trying to understand in order to implement it. I just want to make sure I have a firm grasp of the underlying algorithm.

For reference, I'm referring mainly to this book for the algorithms and mathematical understanding: http://people.inf.ethz.ch/arbenz/ewp/Lnotes/chapter3.pdf

First, basic QR algorithm: Given an $N \times N$ non-symmetric real A, let QR(A) be the QR Decomposition into an orthogonal matrix Q and an upper triangular matrix R.

Then, iterate $QR(A_{k-1}), A_k = R_k*Q_k$ until the last element in the first sub diagonal is close (enough) to zero. Take the last element (in the main diagonal) as an eigenvalue and deflate the matrix by removing its last row and column and begin the process again, until the entire matrix has been consumed.

To handle complex eigenvalues, I've assumed that if this element does not converge to zero before a specified iteration_max, then the trailing $2 \times 2$ sub matrix contains two eigenvalues (which still may be real eigenvalues, I think, so I check for that). I apply a straight quadratic equation to that sub-matrix's characteristic polynomial to get its eigenvalues, and deflate the A matrix by two rows and columns, and iterate.

Everything before this point seems to work when I implement it in code.

Now, the problem is that the convergence is much too slow. I have to set the maximum iterations severely high in order to handle some of my test cases. So I went and read up on advanced QR tricks.

One is to reduce the input matrix to upper Hessenberg form, which I believe I've done correctly through the transformation H = Hessenberg(A). However, for one of my test cases, this causes the required iterations to actually increase. That makes me scratch my head.

More importantly, I tried to implement shifting, and this broke my algorithm. So I need to make sure I understand that.

Let the shift S be defined as the last element on the main diagonal. Let I be defined as an appropriately sized identity matrix.

Then, I do $QR(H_k – I * S_k)$. And $H_{k+1} = R * Q + I * S_k$.

This causes my matrix to rapidly converge. Unfortunately, it also effectively zeroes the last element in the main diagonal of H in my test case, causing it to incorrectly report an eigenvalue near zero.

What am I doing wrong?

Edit: (please don't laugh at my little $4 \times 4$ matrix)
Test matrix:

double[][] Matrix = {
        {-2.476, -2.814, 4.29, -3.649},
        {2.839, -2.859, 1.623, -2.926},
        {-0.392, -3.206, -0.401, -2.174},
        {2.241, -4.435, -3.963, 4.102}};

Edit: link to repository with code. Relevant function is large or I would embed it. Located in SquareMatrix.eigenvalues()
https://github.com/rwthompsonii/matrix-java

Best Answer

The problem is that a real upper Hessenberg matrix can have complex eigenvalues, which your code seemingly allows for. However, if you always choose your shift to be the last diagonal element above the converged part, it will always be real, since the QR decomposition of a real shifted upper Hessenberg matrix is still real. If you look at section 3.5,

Another problem occurs if real Hessenberg matrices have complex eigenvalues. We know that for reasonable convergence rates the shifts must be complex.

Hence the need for the double shift algorithm.