[Math] the difference between Hensel lifting and the Newton-Raphson method

congruenceselementary-number-theorynumerical methods

So in the Newton-Raphson method to iteratively approximate a root of a real polynomial, we start with a crude approximation $x_0 \in \mathbb{R}$ for $f(x)=0$ where $f(x) \in \mathbb{R}[x]$. For the next iterate $x_1$, we put $x_1 = x_0 + \epsilon$, and we want to determine $\epsilon$ to get a better approximation. For this we use a Taylor series and take a linear approximation, and equate $f(x_1)$ to 0 to get a value of $\epsilon$.
$$ f(x_1) = f(x_0 + \epsilon) = f(x_0) + \epsilon f'(x_0) + O(\epsilon^2) \approx f(x_0) + \epsilon f'(x_0)$$
$$ 0 = f(x_0) + \epsilon f'(x_0)$$
$$ \epsilon = – \frac{f(x_0)}{f'(x_0)}$$
$$ x_1 = x_0 – \frac{f(x_0)}{f'(x_0)}$$
Of course, a necessary condition here is that $f'(x_0) \neq 0$.

Now in the case of Hensel lifting of a root modulo $p$ of $f(x) \in \mathbb{Z}[x]$ to a root modulo $p^2$, we do something very similar. If $x_0 \in \mathbb{Z}$ is such that $f(x_0) \equiv 0 \pmod p$ and $f'(x_0) \neq 0 \pmod p$, then again we take $x_1 = x_0 + p\epsilon$, ignore everything but first order terms by going modulo $p^2$ and find $\epsilon$ by equating $f(x_1)$ to zero.
$$ f(x_1) = f(x_0 + p\epsilon) = f(x_0) + p\epsilon f'(x_0) + O(p^2\epsilon^2) $$
$$ 0 = f(x_0) + p\epsilon f'(x_0) \pmod {p^2} $$
$$ \epsilon = – \frac{f(x_0)}{pf'(x_0)}$$
$$ x_1 = x_0 – \frac{f(x_0)}{f'(x_0)}$$

So just as before, we end up ignoring terms beyond the linear term, and the $x_1$ that we get is pretty much the same thing, except that if we do get a fraction, we think of it (the division or inverse) as operating within the ring $\mathbb{Z}/p^2\mathbb{Z}$ and express it as an integer.

So except for the starting value (in Newton-Raphson we start at some (any) crude approximation, while in Hensel lifting we start with a root modulo prime $p$), are the two methods essentially the same? Can't we obtain the lifts modulo higher powers of $p$ simply by looking at the iterates in the Newton-Raphson method, if only we start with an agreeable candidate?

Best Answer

There is a very general viewpoint that relates Newton's method and similar successive approximation schemes such as Hensel's lemma. This is essentially folklore, but has become more accessible recently due to computational applications (e.g. polynomial factorization). To locate pertinent literature you can begin with the papers reviewed below.


von zur Gathen, Joachim (3-TRNT-C) 85j:12012 12J20 65P05
Hensel and Newton methods in valuation rings.
Math. Comp. 42 (1984), no. 166, 637--661.

Hensel's lemma is a fundamental tool in the study of algebraic equations over $p$-adic fields. In the folklore of number theory it has been known for a long time that Hensel's and Newton's method are formally the same (this remark appears in printed form in an article by D. J. Lewis published in a book edited by W. J. LeVeque [Studies in number theory, 25--75, see p. 29, Prentice-Hall, Englewood Cliffs, N.J., 1969; MR 39 #2699]).

A generalized version of Hensel's lemma in suitable valuation rings is contained in N. Bourbaki's book [Elements of mathematics, 23, Commutative algebra (French), see Chapter III, Section 4, Theorem 1 and Theorem 2, Hermann, Paris, 1958; MR 20 #4576].

This paper also deals with the study of Hensel's method in valuation rings and shows that Newton's method is a special case of Hensel's. The presentation emphasizes the algorithmic point of view and is very detailed and clear. The linear and quadratic cases of Hensel's lemma are both given. Newton's method is applied to systems of nonlinear partial differential equations. Then the author presents an algorithm for the computation of a shortest nonzero vector in a non-Archimedean valuation module. These results are applied in the last section, which contains an algorithm for factoring polynomials over a ring with valuations.

          Reviewed by Maurice Mignotte

Ribenboim, Paulo (3-QEN) 87a:12014 12J10 13A18
Equivalent forms of Hensel's lemma.
Exposition. Math. 3 (1985), no. 1, 3--24.

From the introduction: "The celebrated Hensel's lemma, which is the cornerstone of the theory of $p$-adic numbers, has been the object of extensive studies. However, our aim in this paper is not to describe the development of ideas and applications centered around Hensel's lemma, but rather to examine closely the various formulations found in the literature. We place ourselves in the framework of the theory of valued fields and show that Hensel's lemma is logically equivalent to many propositions concerning the number of extensions of the valuation to algebraic extensions, or the lifting of polynomials from the residue field, or the determination of zeroes of a polynomial by a method which dates back to Newton, or even to a geometric formulation concerning the mutual distance between the zeroes of polynomials. These facts are of a `folkloric' nature, yet no complete proof of their equivalence has appeared in any one paper.

"This article is written at the level of research students."

          Reviewed by Antonio Jose Engler

Miola, A. (I-ROME-I); Mora, T. (I-GENO) 90f:68096 68Q40 13A18 13J10 Constructive lifting in graded structures: a unified view of Buchberger and Hensel methods.
Computational aspects of commutative algebra.
J. Symbolic Comput. 6 (1988), no. 2-3, 305--322.

A graded structure is a filtered commutative ring $A$ which is filtered by a totally ordered group and has a graded associated ring and a ring completion. The authors define a process for solving, in the ring completion, a polynomial multivariate equation over $A$ by successive approximations. They discuss under which conditions this process converges.

The main interest of this theory is that Hensel lifting, Buchberger's algorithm for Grobner basis computations in polynomial rings and Hironaka's division process by a standard basis in rings of formal power series are instances of the above process. For example, Hensel lifting is an approximative resolution of $yz-a=0$, which needs some conditions on $a$ and on the initial values of $y$ and $z$ to be convergent.

{For the entire collection see MR 89j:68004}.

          Reviewed by Daniel Lazard