[Math] Efficiently factoring polynomials over $\Bbb F_2$

arithmeticcoding-theoryfinite-fields

I am attempting to write some software which is intended to generically answer the question of which Cyclic Redundancy Code (CRC) generating polynomial is used for a given set of sample messages using the same unknown CRC.

For example here are two messages in hexadecimal which use the same CRC:

ee 00 00 00 00 01 20 13 10  (message 1)
ee 00 00 00 00 03 20 a3 23  (message 2)

Since CRC calculations are essentially polynomial division over GF(2), We can consider each CRC calculation to be
$M(x) x^n = Q(x) G(x) + R(x)$
where $M(x)$ is the original message appended with $n$ zeroes, $G(x)$ is the generating polynomial and $R(x)$ is the remainder which is identical to the CRC value attached to the end of the message. In this particular case, the original message is

ee 00 00 00 00 01 20

and the appended $R(x)$ is 13 10

Because these two messages use the same generating polynomial, we know that
$M_1(x) x^n = Q_1(x) G(x) + R_1(x)$
and
$M_2(x) x^n = Q_2(x) G(x) + R_2(x)$. Subtracting one from the other we get:
$$\begin{eqnarray*}
M_1(x) x^n – M_2(x) x^n &=& Q_1(x) G(x) + R_1(x) – Q_2(x) G(x) – R_2(x)\\
(M_1(x) – M_2(x))x^n – R_1(x) + R_2(x) &=& (Q_1(x)-Q_2(x))G(x)\\
\end{eqnarray*}$$
Since addition and subtraction of polynomials over GF(2) are both identical to the exclusive-or operation, this result means that the exclusive-or of the two messages (including the CRCs) is identical to $(Q_1(x)-Q_2(x))G(x)$, so finding the generating polynomial $G(x)$ could be done by factoring this result. In the specific case of the two messages above, we get

02 00 b0 33

Because of the way CRCs are most often calculated, the bits are "swapped" so that this actually corresponds to the polynomial:
$x^{30}+x^{11}+x^{10}+x^8+x^7+x^6+x^3+x^2$. This is trivially factored into $x^2(x^{28}+x^9+x^8+x^6+x^5+x^4+x+1)$, but that's where I get stuck. I know that in this particular case, $G(x) = x^{16}+x^{12}+x^5+1$ (which some may recognize as the standard 16-bit CCITT polynomial).

How do I write efficient code to factor that polynomial over GF(2) without resorting to brute force?

Looking for literature that would help answer that question, I found several papers that looked promising:

However, I am an engineer rather than a mathematician, and I find I am unable to understand what the papers actually mean and therefore unable to use whatever clever algorithms they purport to contain. Short of going back for a degree in field theory, is there some means by which I might learn how to use modern polynomial factoring techniques in application to this problem?

Best Answer

I've had occasion to mention the symbolic math software Sage in a few answers, e.g about irreducibility over the binary field and factoring of rational polynomials. A good way to do off-the-cuff experiments is with an online Sage account, which has evolved into SageMathCloud.

For the example given in the Question, I made this online Sage workfile (create an account and a new project, and it will display an editable empty workfile):

P.<x> = GF(2)[ ]
f = x^30 + x^11 + x^10 + x^8 + x^7 + x^6 + x^3 + x^2
f.factor()

Running the project (leftmost icon) produces this output:

(x + 1) * x^2 * (x^3 + x + 1) * (x^4 + x^3 + x^2 + x + 1) * (x^5 + x^4 + x^3 + x + 1) * (x^15 + x^14 + x^13 + x^12 + x^4 + x^3 + x^2 + x + 1)

or in more concise (MathJax) form:

$$ (x + 1) * x^2 * (x^3 + x + 1) * (x^4 + x^3 + x^2 + x + 1) * (x^5 + x^4 + x^3 + x + 1) * (x^{15} + x^{14} + x^{13} + x^{12} + x^4 + x^3 + x^2 + x + 1) $$

Note that the "standard 16-bit CCITT polynomial" $G(x) = x^{16}+x^{12}+x^5+1$ cited in the Question is actually the product of the first and last of these irreducible factors.

I'm also a fan of Victor Shoup's NTL: A Library for doing Number Theory, "a high-performance, portable C++ library providing data structures and algorithms for manipulating signed, arbitrary length integers, and for vectors, matrices, and polynomials over the integers and over finite fields."

Halfway down this page we are told that Sage will use the NTL library for polynomials over the binary field, e.g. for multiplication and GCD's, at least if the command is appropriately framed:

sage: x = PolynomialRing(GF(2), 'x').gen()
sage: f = (x^3 - x + 1)*(x + x^2); f
x^5 + x^4 + x^3 + x
sage: g = (x^3 - x + 1)*(x + 1)
sage: f.gcd(g)
x^4 + x^3 + x^2 + 1

Developer level information is on this Sage documentation page.

Ultimately you may want to build a pipeline for factoring over the binary field that goes directly against NTL, cutting out the Sage middleman, as the bare C++ implementation should eliminate some overhead. Information about building NTL itself with the GF2X library is here.

I've built Sage from source a couple of times (it's a fully automated but time-consuming process). If I get around to doing a similar build from recent NTL source, I'll report my experiences here.

Finally this 2002 paper by von zur Gathen and Gerhard, "Polynomial Factorization over $\mathbb{F}_2$", may strike you as an accessible account of investigations extending Cantor-Zassenhaus.

Related Question