[Math] Determining a recurrence relation

algorithmsco.combinatoricslinear algebrana.numerical-analysis

I would like to solve the general problem of determining a linear recurrence relation that fits a given integer sequence of length $n$, or stating that none exists (with fewer than $n/2-k$ coefficients for some reasonable fixed $k$, perhaps 2, to ensure that I'm not overfitting). Actually, I'd like to find the smallest one.

Of course the problem is apparently easy: just search for one of length 1, 2, …, n/2-k until one is found. At each step all that is needed is to solve the appropriate matrix equation (tried to write it out, no tabular environment here, but it's obvious) and check the result against the unused members of the sequence. But for reasonably large $n$ this is impractical: the linear algebra is too difficult.

Unfortunately, it's not rare for me to work with a sequence that is clearly a recurrence relation, but which appears to have a large order, so I can't simply assume that not finding a relation with order below 100 means that none exists. This raises two questions for me:

  1. Is there a fast way to calculate these, compared to the naive approach above?
  2. If one recurrence is known, can this be used to speed the search for recurrences of smaller order (or to prove that none exist)?

One unenlightened approach that comes to mind for #1 is to solve the matrix over the real numbers (using floating-point approximations) rather than solving it exactly over $\mathbb{Q}$. This seems reasonable, but it's not obvious how much precision is needed nor how far the numbers could be if a solution was actually found (in which case, presumably, the system should be re-solved exactly). Although solving systems this way results in serious speedup even using quadruple precision, without appropriate numerical analysis I don't think it's usable. Hopefully there is a better way.

On #2, consider a (periodic) sequence with recurrence relation $a_n=a_{n-6}.$ Basic algebra suffices to show that any recurrence of the form $a_n=2a_{n-1}-2a_{n-2}+a_{n-3}$ is also of that form, so some sequences of order 6 can be simplified (nontrivially, that is not just removing trailing zeros) to order 3. Does it suffice, for example, to check only orders dividing 6?

Best Answer

This problem comes up often in coding theory, and it can be solved efficiently by the Berlekamp-Massey algorithm (Wikipedia has pseudo-code). This is more or less equivalent to using continued fractions, although many expositions don't present it that way: given a sequence $a_0,a_1,\dots,a_N$, look at the rational function $\sum_{i=0}^N a_i x^{-i}$ and compute its continued fraction expansion, with coefficients that are polynomials in $x$. That just amounts to applying Euclid's algorithm to the polynomials $\sum_{i=0}^{N} a_i x^{N-i}$ and $x^N$.

A degree $d$ linear recurrence must be satisfied by more than $2d$ terms of the sequence to mean anything, and any such recurrence will be detected by this method. Specifically, that means the corresponding rational function with degree $d$ denominator must be a convergent to the continued fraction.

In practice, as Charles Matthews suggested, you can generally speed up the arithmetic substantially by working modulo a prime (say, a fairly large random prime). This is particularly an issue when there isn't a low-degree recurrence, since in that case generically you'll get a lot of partial quotients of degree $1$ with rapidly growing denominators. Checking that there's no low-degree recurrence modulo a prime will be much faster, since it will avoid the huge denominators.