[Math] The behavior of a certain greedy algorithm for Erdős Discrepancy Problem

co.combinatoricsnt.number-theorypolymath5

Let $N$ be a positive integer.
We want to find a completely multiplicative functions $f(n)$ with values $\pm 1$ for $n \le N$ such that the discrepancy
$$D=\max_{n \le N} |\{\sum_{i=1}^nf(i)\}|$$
is as small as possible. This is Erdős Discrepancy problem for multiplicative functions.

Consider the following greedy algorithm:

After you assigned the values $f(2),f(3),\dots f(p_i)$ for the first $i$ primes assign the value $f(p_{i+1})$ so as to minimize the maximum discrepancy $|\{\sum_{i=1}^nf(i)\}|$ in every partial sum where unassigned entries of $f$ get the value zero.

Question: How does this greedy algorithm perform?

Experimental or heuristic answers as well as rigorous proofs are welcomed.

For more background and related questions see this post .

Variation

Consider the same greedy algorithm when you impose the condition that $f(m)=0$ unless $m$ is square free. (If $m$ is not square free $f$ is multiplicative and has values $\pm1$.)

Question: How does our greedy algorithm performs on the square-free version?

Namely, we would like to understand the behavior of the discrepancy of the function obtained by our greedy algorithm. While for EDP there are known examples with $\log N$ discrepancy, this is not known for the square-free version.

Update:

The very nice answer by rlo suggests that the greedy algorithm gives discrepancy close to $n^{1/3}$ or so, and rlo expects it also for the square free variation. Can an upper bound of $N^{1/2-\epsilon}$ be proved? What about a lower bound of $N^{\epsilon}$. Another interesting question is if you can improve the greedy algorithm to get lower discrepancy. Our greedy ignore 0's in intervals. A greedy algorithm that ignore intervals with 0's was considered in polymath5 and to the best of my memory achieve discrepancy $n^{1/2}$. Maybe a clever interpolation between these two variants will do a better job than both?

Further meditation and a new variant

It seems that in our greedy algorithm the decisions we make for small primes are fairly irrelevant. A way to check it:

Run the algorithm for N and test what is the discrepancy for an interval [1,T] where T is,
say, $\sqrt N$. I would expect the answer to be roughly $\sqrt T$.

So now we can think about the following variation:

Let $a>1$ be a real number. We run the greedy algorithm above but our decision for $f(p)$ is based only on intervals $[1,n]$ where $n \le p^a$. (Of course we consider only $n \le N$.

Questions: Can this variant lead to lower discrepancy?

What is the optimal value of $a$?

Best Answer

Update 2: Original answer below. I've put together graphs showing more than just champions, using every $N\leq 10^4$ and also every $N\equiv 0\pmod{100}$ up to $10^5$. This is for the original version, not the variant, but I'd expect that to be essentially the same. I am starting to be somewhat skeptical of my $1/3$ estimate. I'll collect some data further out, but it'll be less complete since it's more costly to collect.

Here's the graphs of $D(N)$ versus $N$. The added curves are $N^{1/3}$ and $\log N$. enter image description here enter image description here

Here's the graph of $\log D/\log N$ versus $N$. The horizontal line is at $1/3$. (Note the different scales.) enter image description here enter image description here

Original answer: I have some basic numerical observations. I hacked together some code in c++ to work on this, and would be happy to collect more focused data or to share the code. Also, whenever there was a choice of whether to assign the value $+1$ or $-1$ to $f(p)$, I chose $-1$ for consistency of output. This yields a well-defined function $f_N$ for each $N$. Let $D(N)$ denote the discrepancy of $f_N$ up to $N$.

(I've also looked at choosing $f(p)=+1$ if it's undetermined, and at $f(p)=\pm 1$ according to whether $p\equiv 1,3\pmod{4}$. In these cases, the data is essentially the same as below.)

  1. $D(N)$ is roughly increasing, but is not monotonic. Its champion values for $N\leq10000$ are, in the form $(N,D(N))$, $(1,1)$,$(10,2)$,$(24,3)$,$(70,5)$,$(91,6)$,$(391,7)$,$(553,8)$,$(668,9)$,$(961,10)$,$(1235,11)$,$(1265,13)$,$(2561,14)$,$(2604,17)$,$(6275,18)$,$(6276,19),\dots$. This growth is more than logarithmic, and is probably polynomial -- $\log D/\log N$ hovers pretty close to $1/3$ for each of these points, so that may be the answer for the $\Omega$ result.

  2. It appears that the functions $f_N$ may converge as $N\to\infty$, but I'm not sure of this and need more data. Let $l(N)$ denote the least prime $p$ for which the value of $f_N(p)$ is undetermined. Certainly $l(N)\geq 5$ once $N\geq 4$, and while there is a great deal of fluctuation, it appears that maybe $l(N)\geq 7$ once $N\geq40500$; I will be seeing if this is (numerically) true.

I hope to update this answer once I have more data.

Update: For the variation where we only look at the values of $f_N$ on squarefree integers, the behavior appears to be the same.

$(N,D(N),\log D/\log N)$:

  • $(1,1,\text{NaN})$
  • $(30,2,0.2037950471)$
  • $(42,3,0.2939297479)$
  • $(77,4,0.3191428313)$
  • $(190,5,0.3067334722)$
  • $(238,6,0.3274252273)$
  • $(319,8,0.3606890916)$
  • $(939,9,0.3210056698)$
  • $(1358,10,0.3191931033)$
  • $(1461,11,0.3290703914)$
  • $(2185,13,0.3335707591)$
  • $(2769,14,0.3329519195)$
  • $(3354,15,0.3335896252)$
  • $(3689,17,0.3449622741)$
Related Question