[Math] How do iterative methods applied to the companion matrix of a polynomial $p(\lambda)$ relate to $p$ itself

companion-matriceseigenvalues-eigenvectorsnumerical methodspolynomialsroots

A few days ago, I had a vague question in my mind about "matrix methods" for finding the roots of a polynomial. Now, I can ask at least a semi-precise question, thanks to the post
How to calculate complex roots of a polynomial. I do not have a reference for this, but the idea behind the stated method is as follows: First create the companion matrix $C$ of the given polynomial $p$, whose characteristic equation is $p$ itself. Now, the eigenvalues of $C$ will give the roots of $p$.

Take a favorite iterative procedure for computing eigenvalues; for concreteness, I'll restrict to the QR algorithm, or the power iteration for the dominant eigenvalue. Does applying this procedure to $C$ correspond to a natural root-finding algorithm for $p$?

I thought about this a bit, but do not have much of an idea. Today, I read Arturo Magidin's comment:

Surely such methods can be translated into methods to be applied directly to the polynomial when the matrix is just the companion matrix. Of course, they can be rather involved in description in that case.

I was quite surprised by this, since I was actually expecting a simple and intuitive translation. Is there one?

Best Answer

Take a favorite iterative procedure for computing eigenvalues; for concreteness, I'll restrict to ... the power iteration for the dominant eigenvalue. Does applying this procedure to $C$ correspond to a natural root-finding algorithm for $p$?

I'll restrict myself to power iteration proper as well, lest this answer get too long. Most of the variants of power iteration translate readily to a corresponding operation on polynomials, which I might discuss later in a separate answer if there's interest. (I've already said something on the QR algorithm in the comments.)

I'll consider the following form of the (upper Hessenberg) Frobenius companion matrix for $p(x)=c_n x^n +\cdots +c_1 x+c_0$:

$$\mathbf C=\begin{pmatrix}-\frac{c_{n-1}}{c_n}&\cdots&-\frac{c_1}{c_n}&-\frac{c_0}{c_n}\\1&&&\\&\ddots&&\\&&1&\end{pmatrix}$$

which is similar to another form you might be accustomed to:

$$\mathbf T\mathbf C\mathbf T^{-1}=\begin{pmatrix}&&&-\frac{c_0}{c_n}\\1&&&-\frac{c_1}{c_n}\\&\ddots&&\vdots\\&&1&-\frac{c_{n-1}}{c_n}\end{pmatrix}$$

with the similarity transformation $\mathbf T$ given by the unit upper triangular Toeplitz matrix

$$\mathbf T=\begin{pmatrix}1&\frac{c_{n-1}}{c_n}&\cdots&\frac{c_1}{c_n}\\&\ddots&\ddots&\vdots\\&&\ddots&\frac{c_{n-1}}{c_n}\\&&&1\end{pmatrix}$$

Now, the key is that there is an intimate relationship between polynomials, difference equations, and the Frobenius companion matrix; to wit, the difference equation

$$c_n u_{k+n}+\dots+c_1 u_{k+1}+c_0 u_k=0$$

has $p(x)$ as its characteristic polynomial, and its solutions $u_k$ take the form

$$u_k=w_1 \mu_1^k+\dots+w_n \mu_n^k$$

where the $\mu_j$ satisfy $p(\mu_j)=0$ with $|\mu_1|\geq \cdots \geq |\mu_n|$, and the $w_j$ are arbitrary constants. For the rest of this answer, I assume that there is only one dominant root $\mu_1$ (thus, $\mu_1$ is necessarily real and simple).

As $k\to \infty$, the ratio $\dfrac{u_{k+1}}{u_k}$ eventually tends to $\mu_1$, with the convergence rate depending on how far away the dominant root is from the other roots. Thus, one method to find the dominant root $\mu_1$ of a polynomial $p(x)$ would consist of iterating the recurrence relation for the $u_k$ (with starting values chosen such that $w_1$ in the expression for $u_k$ is nonzero) and forming successive ratios. This method is called the Bernoulli iteration.

The connection with companion matrices can be seen if we note that

$$\begin{pmatrix}u_{k+1}\\u_k\\\vdots\\u_{k-n+2}\end{pmatrix}=\begin{pmatrix}-\frac{c_{n-1}}{c_n}&\cdots&-\frac{c_1}{c_n}&-\frac{c_0}{c_n}\\1&&&\\&\ddots&&\\&&1&\end{pmatrix}\cdot\begin{pmatrix}u_k\\u_{k-1}\\\vdots\\u_{k-n+1}\end{pmatrix}$$

(see for instance this related question); we can thus consider Bernoulli's method as equivalent to repeatedly multiplying some arbitrary starting vector $\mathbf u_0$ with $\mathbf C$ ($\mathbf u_j=\mathbf C\mathbf u_{j-1}$), after which the dominant eigenvalue is estimated as the modified Rayleigh quotient

$$\frac{\mathbf e_1^\top\mathbf C\mathbf u_j}{\mathbf e_1^\top\mathbf u_j}$$

where $\mathbf e_1$ is the first column of the identity.

This leaves the problem of starting up the iteration. The customary way is to derive the starting values from the solutions of the equation

$$\mathbf T\mathbf u_0=\mathbf d$$

where $\mathbf d=(c_1\quad 2c_2\quad \cdots\quad n c_n)^\top$ represents the coefficients of $p^\prime(x)$.


As I mentioned earlier, most of the variants of power iteration, when specialized to the Frobenius companion matrix case, give rise to recursive methods for estimating the roots of a polynomial that directly act on the polynomial's coefficients. In particular, the Jenkins-Traub method, the modern descendant of the Bernoulli iteration, can be thought of either as a recursion on polynomials or on companion matrices. Some of the details are discussed in this paper.


Here's some sundry Mathematica code for playing around with Bernoulli's method:

n = 5; p = Expand[Times @@ (x - Range[n])];

cmat[poly_, x_] := Module[
  {cl = CoefficientList[poly, x], n = Exponent[poly, x]},
      SparseArray[{{1, i_} :> -cl[[n - i + 1]]/Last[cl],
      Band[{2, 1}] -> 1}, {n, n}]]

start = LinearSolve[ToeplitzMatrix[UnitVector[n, 1], 
  (Reverse[Rest[#]/Last[#]]) &[CoefficientList[p, x]]],
   CoefficientList[D[p, x], x]];

With[{prec = 20, its = 20},
  Ratios[Drop[LinearRecurrence[
      -(Reverse[Most[#]/Last[#]]) &[CoefficientList[p, x]], 
        Reverse[N[start, prec]], its + n + 1], n - 1]]]

With[{prec = 20, its = 20}, 
  Table[((UnitVector[n, 1].cmat[p, x].#)/(UnitVector[n, 1].#)) &[
     MatrixPower[cmat[p, x], k, N[start, prec]]], {k, 0, its}]]

You can change p here to any polynomial whose largest (in magnitude) root is real (of course, you must then set n to be the degree of p).

Related Question