[Math] The probability for a sequence to have small partial sums

co.combinatoricspolymath5pr.probability

The question

Let $a_1,a_2,\dots,a_n$ be a sequence whose entries are +1 or -1. Let t be a parameter. My question is to give an estimate for the number of such sequences so that

$|a_1+a_2+\dots a_k| \le t$, for every $k$, $1 \le k\le n$.

(In other words, the probability that a random sequence will satisfy the above relation.)

I am especially interested in this probability when t is small. Either a constant, or slowly growing (say, it behaves like (log n)^s for some real number s, or slower).

variations

1) I would also like to know what is the situation if you demand that the average value of |a_1+a_2+\dots a_k| is smaller than t, rather than the maximum value.

2) If there are more delicate estimates for the case that t itself is a function of k e.g. t itself grows as (log n)^s I would be very interested as well.

Motivation

This question is relevant to the recent collective effort (polymath5) regarding the Erdos Discrepancy Problem (EDP). It particular it is relevant to a probabilistic heuristic regarding what the answer to EDP, and to several related questions, should be.

It is also relevant to certain probabilistic approaches towards construction of sequences with low discrepancy.

Expectation

I would expect that the answers to the questions above are known. But they are not known to me. It is easy to be convinced, for example, that when t is bounded the number of such sequences is $c_t^{-n}$, for $c_t<2$ but I would like to know the dependence of c_t on t.

Best Answer

For $t$ fixed, the count is proportional to $\lambda^n$, where $\lambda = 2 \cos \frac\pi{2t+2}$ is the principal eigenvalue of the adjacency matrix of the path with $2t+1$ vertices. The all-positive (Perron-Frobenius) eigenvector corresponding to $\lambda$ is

$$\bigg(\sin \frac{\pi}{2t+2}, \sin \frac{2\pi}{2t+2},\sin \frac{2\pi}{2t+2},\dots,sin \frac{(2t+1)\pi}{2t+2}\bigg).$$

Since $-\lambda$ is also an eigenvalue, the stable behavior of the distribution of endpoints of paths which stay in $[-t,t]$ is an oscillation between the odd entries

$$\bigg(\sin \frac{\pi}{2t+2}, 0,\sin \frac{3\pi}{2t+2},0,\dots,\sin \frac{(2t-1)\pi}{2t+2},0,\sin \frac{(2t+1)\pi}{2t+2}\bigg).$$ and even entries $$\bigg(0,\sin \frac{2\pi}{2t+2}, 0,\sin \frac{4\pi}{2t+2},0,\cdots ,0,\sin \frac{2t\pi}{2t+2},0\bigg).$$

The exact count of paths staying in $[-t,t]$ is a sum of signed binomial coefficients.

The number of paths from $0$ to $i$ is 0 if $n \not \equiv i ~\mod 2$, and $n \choose (n\pm i)/2$ when $n \equiv i ~\mod 2$.

The number of paths which never leave $[-t,t]$ from $0$ to $i \in [-t,t]$ with $n \equiv i ~\mod 2$ is

$$ \sum_{j\in \mathbb Z} (-1)^j {n\choose (n +i)/2 + j(t+1)}$$

by the reflection principle applied to the group of isometries of $\mathbb R$ generated by reflecting about $t+1$ and $-t-1$.

If you sum over all $i \in [-t,t]$, then when $n$ is even, you get a signed sum of binomial coefficients with $t+1$ positive signs in a row alternating with $t+1$ negative signs in a row. If $n$ is odd, then you get $t$ positive signs in a row, skip a term (give it a coefficient of $0$ instead of $\pm 1$), then $t$ negative signs in a row, skip a term, etc.

For example, for $n=100, t=2,$ the number of paths is

$$ ... +{100\choose 43} + {100\choose 44} + {100 \choose 45} - {100 \choose 46} - {100 \choose 47} - {100\choose 48} + {100\choose 49} + {100 \choose 50} + {100\choose 51} - ...$$

For $n=101, t=2,$ the number of paths is

$$ ... +{101\choose 44} + {101\choose 45} - {101\choose 47} - {101 \choose 48} + {101\choose 50} + {101\choose 51} - {101\choose 53} - {101\choose 54} + ...$$

These can be summed using the techniques in the answers to the Binomial distribution parity question.

A lot more can be said when $t$ varies, but the answers are more complicated. For $t$ slowly increasing, as $c\sqrt[3]n$, there is enough time for the distribution to stabilize (for each parity) at a given value of $t$, since the ratio between the magnitudes of the largest two eigenvalues and the magnitudes of the next two is about $1+c/t^2$, and the principal eigenvectors have a small $L^1$ distance for adjacent values of $t$. You should pick up a constant factor for each transition. In other words, the number of paths when you spend at least $n_t \gt c t^2$ steps at a given $t$ should be

$$C \prod_{t \le t_{max}} (2 \cos \frac{\pi}{2t+2})^{n_t}$$

where $C$ is between some functions $f_{lower}(t_{max}) \lt C \lt f_{upper}(t_{max})$ which does not depend on the values of $n_t$. I don't think the $n_t \gt c t^2$ condition is sharp for this behavior. Something like $n_t \gt c t^2/\log t$ should work, too. The geometry of the eigenvectors for adjacent values of $t$ lets you estimate $f_{lower}$ and $f_{upper}$.

For $t$ more rapidly increasing, different behaviors occur. By the law of the iterated logarithm, if $t$ increases as $t(n) = \sqrt {(2-\epsilon) n \log\log n},$ random paths will almost surely violate the constraint. I think there are precise versions of the law of the iterated logarithm which may tell you when a positive proportion of random paths do not violate the constraint. I would guess that if $t(n) = \sqrt{(2+\epsilon) n \log\log n}$ then a positive percentage of random paths won't violate the constraint.

Related Question