[Math] Long integer multiplication using FFT in integer rings

algorithmscomputational complexityfourier analysisroots-of-unity

I would like to perform long integer (~= polynomial) multiplication using the FFT or its direct analogue, but never leave integer rings.

Please excuse in advance all my mistakes in formulation and possible misunderstanding of concepts, since I am always having trouble with algebra. I would highly appreciate the simplest yet the fullest answer thereby.

I assume that the numbers I multiply have both the length $2^k$, so no problems with exact halving.

As far as I understand, I don't care about the algebraic structure I'm performing the FFT in, as long as I have found the primitive n-th root of unity $\omega$.

Question 1

Is it true? I was thinking about $GF$, of which I know nothing 🙂 except for the fact that they have primitive elements which cannot be computed quickly. But even if I compute it in advance, for some $GF(p)$ of big cardinality – will this allow me to quickly compute primitive $n-th$ roots of unity for arbitrary $n=2^k$? (perhaps with an obvious restriction $n<N$ for some big $N$)

Question 2

Is it true that if I have a primitive n-th root of unity, all the powers $\omega^0, \omega^1, …, \omega^{n-1}$ ought to be different?

Question 3

Is this enough for the FFT? In Cormen's book the 'divide and conquer' approach relies on the fact that when you perform the division

$A(x) = A_{even}(x^2) + x A_{odd}(x^2)$

and compute the values of these two polynomials at points

$(\omega^0_n)^2, (\omega^1_n)^2, …, (\omega^{n-1}_n)^2$,

the latter list consists of only $n/2$ distinct values, which are all $n/2$-th primitive roots of unity.

Question 4

Which algebraic structure can guarantee me this property?

Question 5

I also wanted to emphasize that I naturally want to multiply arbitrary numbers quickly (perhaps just bounded by some big value $N$) – anything just to keep the overall complexity of the multiplication in $O(n \log n)$.

So I won't be happy if the algebraic structure is such that for each number length $n=2^k$, I will have to search for a primitive n-th root of unity $\omega$, with a quadratic-or-bigger-complexity search algorithm. In other words, I want $\omega$ to be able to be computed 'fast'.

I will be so happy to get your recommendations as well as a good explanation for non-algebra folks… I also understand that all these questions may be dumb and originate from lacking of knowledge, but I hope for your understanding that I'm only a programmer, and perhaps not the best one 🙂 further questions may arise as it gets more clear, because for now, it's a big mess.

Thank you so much in advance!

Best Answer

To keep this answer short I won't go into detailed explanations of the basics of Fourier transforms: you should be able to find plenty of lectures online. However the question about the most generic structure is interesting, so I'll elaborate on it below.

  1. If the field is fixed, it's easy: for $n' \mid n$, $\omega_{n'}=\omega_n^{n/n'}$. If not, I don't think it's possible (for prime fields).
  2. Yes: if $\omega^i=\omega^j$, since $\omega$ is invertible in $R_p$ $\omega^{j-i}=1$ so $n \mid j-i$, which means $i=j$ if $0\le i,j < n$.
  3. They are all $n/2$-th roots of unity, but only odd powers are primitive ones (since for example $(\omega_n^0)^2=1$). $\omega_n^2$ is a primitive one, it follows from $(\omega_n^2)^{k/2}=\omega_n^k$ so since $n$ is the least $k$ such that $\omega_n^k=1$, $n/2$ is the least $k$ such that $(\omega_n^2)^k=1$. These points can all be written in terms of this primitive $n/2$-th root as $(\omega_n^i)^2=(\omega_n^2)^i$.
  4. You only need a commutative ring with an $n$th primitive root of unity, but this doesn't mean the corresponding transform will be invertible so the calculation, while correct, could be useless. See below.
  5. See below.

What is the most generic structure in which I can use a Fourier transform to multiply integers?

Before worrying about the complexity of the Fourier transform, let's make sure we know exactly what we're computing.

You have polynomials $A$ and $B$ with bounded integral coefficients $a_0,\dots a_p$ and $b_0,\dots b_q$ ($|a_i|,|b_i|\le M$, $p+q<n$). Then you want to compute the product $C=AB$, where $c_i=\sum\limits_{j=\max(0,i-q)}^{\min(i,p)} a_j b_{i-j}$ and $|c_i|\le \min(p,q) M^2$ is the best possible bound that does not depend on $i$. So if you want to reduce the problem to multiplying the projections $\bar A,\bar B$ of $A$ and $B$ in $R[X]$ where $R$ is some finite ring, then $R$ needs to have characteristic at least $r=2\min(p,q) M^2+1$, otherwise you can't recover $C$ from its projection $\bar C$ in $R[X]$.

Notice that since $\bar a_i$ is the projection of the integer $a_i$ we are only really using the subring $R'$ of $R$ generated by 1, and $R'=\mathbb Z/n\mathbb Z$ where $n=\mathrm{char}(R)$ is the smallest positive integer such that $\bar n=0$.

By the Chinese remainder theorem $R'$ is the product of $\mathbb Z/p_i^{q_i} \mathbb Z$. A very nice consequence is that you can perform the product independently in each of the $\mathbb Z/p_i^{q_i} \mathbb Z[X]$, and recover the final $C$ with the CRT (in linear time). On the other hand, non-prime finite fields are of no use since the only finite fields that can appear are the prime fields $\mathbb Z/p_i \mathbb Z=GF(p_i)$ (when $q_i=1$).

So we only need to understand what happens in $R_p=\mathbb Z/p^q\mathbb Z$. In order to find a primitive root of order $n=2^k$, we need $2^k \mid \lambda(p^q)$ where $\lambda$ is the Carmichael function. For $p>2$, this means $p=m2^k+1$ for some $m$. Conversely in this case there is a $g$ primitive mod $p^q$ and $\omega_n=g^{\lambda(p^q)/2^k}$ suffices.

The Fourier transform of $A$ is $\mathscr F_{\omega_n} A(X)=\sum_{i=0}^{n-1} A(\omega_n^i) X^i$, which is well-defined but not necessarily invertible. Indeed when $p=2$, the Fourier transform is not injective, as for example $2\omega_n^{n/2}=2\cdot2^{q-1}=0$. This means the Fourier transform cannot work, so from now on let's assume $p>2$. You can check that $\mathscr F_{\omega_n^{-1}}(\mathscr F_{\omega_n} A)=nA$; since $n=2^k$ is coprime to $p$, $\mathscr F_{\omega_n}$ is invertible and the Fourier transform works.

To sum up: the most generic structure is $\prod_{i=1}^l \mathbb Z/p_i^{q_i} \mathbb Z$ where the $p_i$ are odd primes satisfying $n \mid p_i-1$, and where $\prod_{i=1}^l p_i\ge r=(n-1)M^2+1$.

Can this generic structure be implemented efficiently?

Yes! For a given $p$, we can compute the Fourier transform in $\mathbb Z/p^q\mathbb Z$ for any $q$. Choosing $q>1$ forces you to work in a bigger ring than if $q=1$, but in principle it could make sense if finding primitive roots was difficult and you "ran out" of prime numbers. However, as we will see below, this is not a problem so we're not losing anything by restricting to the simpler case $q=1$.

First, pick some bound $M\ge 1$ and $n=2^k$. Then you only need to pick $l$ elements $p_1,\dots p_l$ in the set $\mathcal P_n$ of primes congruent to -1 mod $2^k$ such that $$\prod_{i=1}^l p_i\ge nM^2$$ and know an $n$th primitive root mod $p_i$ for each $i$.

$\mathcal P_n$ is infinite because of Dirichlet's theorem; Chebotarev's theorem further says that the density of $\mathcal P_n$ in the set of all primes is $1/2^{k-1}$. Suppose you also want $p_i\in\mathcal P_n^K=\mathcal P_n\cap \{K/2,\dots,K-1\}$ with $K$ a power of 2 so that you can do the Fourier transform on machine words while utilizing all available bits. Then you can expect $\mathcal P_n^K$ to have roughly $\frac{K}{n\log K/2}$ elements. Fortunately this is a lot of elements: so you can safely consider only one $n_\max$, take $l$ elements from $\mathcal P_{n_\max}^K$ and you will be able to compute Fourier transforms for $n$ up to $n_\max$ and for $M$ up to $M_\max=(K/2)^{l/2}/\sqrt{n_\max}$. You can precompute both the primes and their primitive $n$th root of unity. To answer your question it's possible to find primes and primitive roots in faster than quadratic time, but since you're precomputing them anyway it doesn't matter.

Once $K$ is fixed, the only remaining parameter to be tweaked is $M$. For small values of $n$, it may suffice to choose a small $M\le K/(2n)$ so that $l=1$, but as $n$ grows, higher values of $M$ could be better. If you model the cost of arithmetic operations as a function of bit size you will be able to find the optimal value of $M$.

Related Question