[Math] A heuristic for the density of solutions to Diophantine equations

diophantine equationsnt.number-theory

Let $f\in\mathbb{Z}[X_1,\ldots,X_n]$ be a Diophantine equation which, for the purposes of this question, I will assume is homogeneous and nonsingular on $\mathbb{R}^n\setminus\{0\}$ (so that $\nabla f\not=0$). Supposing that it has infinitely many primitive integer zeros, we can posit that they are smoothly distributed in an asymptotic sense. Writing $V(R)\subseteq R^n$ for the set of primitive solutions to $f(x)=0$ in a ring $R$, the integer solutions $V(\mathbb{Z})$ clearly lie on the manifold $V(\mathbb{R})$. So, I am looking for a density $\rho\colon V(\mathbb{R})\to\mathbb{R}$ with
$$
\vert V(\mathbb{Z})\cap U\vert\sim\int_{V(\mathbb{R})\cap U}\rho(x)\,d\sigma(x),\qquad\qquad{\rm(1)}
$$
for subsets $U\subseteq\mathbb{R}^n$, where $d\sigma$ is the standard surface integral on $V(\mathbb{R})$. This should hold asymptotically as $U$ is scaled up, and for reasonably regular regions $U$.

My question is regarding a simple (but incorrect — see below) heuristic argument for calculating $\rho$. Choosing positive integer $N$ and real $a\gg N$ then, for large regions $U$, the set of $x\in U$ with $\vert f(x)\vert < 2a$ has volume about $2a\int_{V(\mathbb{R})\cap U}\Vert\nabla f\Vert^{-1}\,d\sigma$, so should contain about that number of integer points. The probability of a random $x\in\mathbb{Z}^n$ being relatively prime to $N$ and satisfying $f(x)=0$ (mod $N$) is $N^{-n}\vert V(\mathbb{Z}/N\mathbb{Z})\vert$. Conditional on $\vert f(x)\vert < 2a$ and $f(x)=0$ (mod $N$), it seems reasonable to suppose that $f(x)=0$ with probability $N/(2a)$. Multiplying these terms together and taking the limit as $N$ increases to include all prime-powers as factors, we get the following expression for $\rho$.
$$
\begin{align}
&\rho(x)=\Vert\nabla f(x)\Vert^{-1}\prod_p c_p,\qquad\qquad{\rm(2)}\\
&c_p=\lim_{r\to\infty}p^{-r(n-1)}\left\lvert V(\mathbb{Z}/p^r\mathbb{Z})\right\rvert.
\end{align}
$$
The product is taken over all primes $p$. This seems like a very neat expression, and can be seen that it gives the correct result for linear equations. However, it is not correct in general. Just looking at quadratic forms for $f$, the expression given by (2) is wrong. I do not have any good feeling as to where exactly this heuristic goes astray, and if it is possible to fix it. Maybe this approach and the reason that it does not quite work is well known. This is not an area in which I am any kind of expert, so maybe others on MathOverflow would be able to help?

For example, consider $f=x^2+y^2-z^2$, so that we are looking for primitive Pythagorean triples. Euclid's parameterization $(x,y,z)=(a^2-b^2,2ab,a^2+b^2)$ can be used to show that $\rho=\sqrt{2}\pi^{-2}\vert z\vert^{-1}$. However, on $V(\mathbb{R})$ we have $\Vert\nabla f\Vert = 2\sqrt{2}\vert z\vert$ and you can calculate $c_2=1$ and $c_p=1-p^{-2}$ for odd prime $p$. Using (2) would lead to $\rho=2\sqrt{2}\pi^{-2}\vert z\vert^{-1}$, which is out by exactly a factor of 2. If we look at Pythagorean quadruples $f=w^2+x^2+y^2-z^2$ instead, then we can calculate $c_p=(1-p^{-1})(1+2p^{-1}1_{\{p\equiv1{\rm\ mod\ }4\}}+p^{-2})$ for odd primes $p$, so the product in (2) is not unconditionally convergent.

Is there a known or, even, just conjectural expression for the asymptotic density $\rho$? And, is it possible to explain precisely how the heuristic used to derive (2) fails?

It would be great if my expression (2) above could be fixed. Heuristics like the one used here are often very useful to understand what the integer solutions to Diophantine equations look like, and it is a bit worrying that it gives the wrong answer in this case. It is also consistent with the idea that find rational solutions to an equation, you should first check for solutions in the completions of $\mathbb{Q}$, according to the Hasse principle. Also, it so nearly works (only being a factor of 2 out for Pythagorean triples) and gives perfectly sensible looking results in many cases, that I am loath to give up and just accept that it doesn't work without a good reason as to why. For example, it does seem perfectly consistent with Falting's theorem (as given in my answer to a previous MO question) and with the Birch and Swinnerton-Dyer conjecture. In the case where $f$ is a a cubic describing an elliptic curve, then $c_p=(1-p^{-1})N_p/p$ for all but finitely many primes $p$, where $N_p$ is the number of $\mathbb{F}_p$-points on the elliptic curve reduced modulo $p$. Then, up to finitely many terms, $\prod_pc_p$ coincides with the Euler product at $s=1$ of $(L(s)\zeta(s))^{-1}$, where $L$ is the L-function of the curve. According to the Birch and Swinnerton-Dyer conjecture, I would expect this to be zero, finite, or infinite when the curve has rank $r=0$, $r=1$ and $r>1$ respectively. Putting this back into (2) is consistent with $\vert V(\mathbb{Z})\cap B_R\vert$ growing at rate $(\log R)^r$, which you would expect for an elliptic curve of rank $r$.

Best Answer

You are on the way to redeveloping the singular series, which does indeed give the correct asymptotic for integral solutions to many flavors of Diophantine equation -- they key words here are "Hardy-Littlewood method" or "circle method," which you can read about in any text on analytic number theory, such as the book of Iwaniec and Kowalski.

Loosely speaking -- when the number of variables is very large relative to the degree of the equation, the singular series is known to give you the right asymptotic. When the number of variables is somewhat large relative to the degree of the equation, it is expected to give the right asymptotic but there are no proofs outside very special cases.

Related Question