[Math] Problem about expectation of maximum partial sum

co.combinatoricspr.probability

Given a number $m$, a random composition (strong) of this number into $n$ positive parts so that we can get $n$ random variable $X_1, X_2,\dots, X_n$ with $$X_1+X_2+\cdots+X_n=m$$

Note that all compositions of $m$ have the same probability.

Let $Y_i = X_i – \mathbb{E}(X_i)$, $S_i = Y_1 + \cdots + Y_i$

I want to calculate $\mathbb{E}(\max_{1 \le i \le n} S_i)$.

Best Answer

[Edited to add Sage plots of the asymptotic distributions for $n=2,3,4,5,6,7$ and $n=10,20,30,40$]

There are $m-1 \choose n-1$ "compositions" of $m$ into $n$ positive parts $X_i$, because they correspond bijectively with choices of $n-1$ partial sums out of $\lbrace 1, 2, \ldots, m-1\rbrace$. By symmetry, each $X_i$ has expectation $m/n$. For large $n$ and much larger $m$, the sequence of normalized partial sums $$ S_i = \sum_{h=1}^i Y_h = \sum_{h=1}^i (X_h - \mathbb{E}(X_h)) = \Bigl(\sum_{h=1}^i X_h \Bigr) - \frac{i}{n} m $$ looks like a scaled "Brownian bridge", i.e. ${\rm BB}(t) = B(t) - B(1)t$ where $B(t)$ is Brownian motion on $[0,1]$. The distribution of $\max_t {\rm BB}(t)$ is known thanks to a reflection trick, and thus so is its expectation. If I did this right, for $\sigma > 0$ the probability that $S_\max := \max_i S(i)$ exceeds $\sigma$ approaches $\exp(-2n(\sigma/m))^2$, which would make $$ \mathbb{E}(S_\max) \sim \int_0^\infty e^{-2n(\sigma/m)^2} d\sigma = \sqrt{\frac\pi{8n}} \cdot m. $$ Since the question at hand asks only for the expectation of $S_\max$, not subtler features of its distribution, the estimate should be reasonably good for moderately large $n$ and $m$. As Aaron Meyerowitz noted, for fixed $n$ we have $$ \mathbb{E}(S_\max) = c_n m + O(1) $$ as $m \rightarrow \infty$; computation of the constants $c_n$ for $n \leq 40$ suggests that $n^{1/2} c_n \rightarrow \sqrt{\pi/8}$ but not very quickly: $n^{1/2} (\sqrt{\pi/8} - n^{1/2} c_n)$ seems to be converging to about $0.66$, and while $\sqrt{\pi/8} = 0.626657\ldots$, the computed $n^{1/2} c_n$ first exceeds $0.5$ only at $n = 27$. The formulas below may give a start towards proving this behavior.

Getting good estimates for finite values of $n$ seems unwieldy because computing the probability that $\max_i S(i) \leq \sigma$ requires counting "compositions" that satisfy $X_1 + \ldots + X_i \leq (i/n) m + \sigma$ for each $\sigma$, which gives rise to an intimidating inclusion-exclusion. Fortunately the answers to this mathoverflow query, which appeared just yesterday, give for any increasing integer sequence $0 < a_1 < a_2 < \ldots < a_N$ a nice determinant formula for the number of $N$-tuples $(x_1,\ldots,x_N)$ of integer sequences satisfying $0 < x_1 < \ldots < x_N$ and $x_i \leq a_i$ for each $i \leq N$. In our setting, we compute the number of "compositions" with $\max_i S(i) \leq \sigma$ by taking $N = n-1$, $x_i = \sum_{h=1}^i X_h$, and each $a_i = \min (m-1, \lfloor (i/n) m + \sigma \rfloor)$. The determinant has order $N$, with $(i,j)$ entry equal $a_i + j - i \choose j - i + 1$ [which is $1$ on the first subdiagonal $i = j+1$, and $0$ on the triangle $i > j+1$ under this subdiagonal]. This makes it feasible to compute exactly the distribution of $S_\max$ for moderately large $n$ and any $m$, and also the asymptotic distribution as $m \rightarrow \infty$.

For given $n$, the asymptotic distribution of $S_\max / m$ is piecewise polynomial, but takes a while to look like the limiting form proportional to $\sigma \exp(-2n(\sigma/m)^2) d\sigma$; notably it is discontinuous at $\sigma = 1/n$ with a substantial jump downwards. Here are Sage plots showing these distributions in red/orange/green/blue/purple/black for $n=2$ through $n=7$:


(source: harvard.edu)

(The total area is $(n-1)/n$, not $1$, because $1/n$ of the measure is concentrated at $\sigma = 0$). Likewise for $n=10,20,30,40$, plotting only $0 < \sigma < 0.6$ because the density is too small to be seen for $\sigma \geq 0.6$:


(source: harvard.edu)

(after much more computing, most of it for the $40 \times 40$ determinants; at $n=40$ the graph finally crests after the downwards plunge at $1/n$).

Finally, a tabulation of the values of $c_n$ for $n = 1, 2, \ldots, 40$. For each $n$ I give $n^n c_n$, which is an integer except for $n=2$ when $c_2 = 1/8$. This value, as well as each of $c_3 = 4/3^3$, $c_4 = 39/4^4$, and $c_5 = 472 / 5^5$, agrees with A.Meyerowitz's calculations.

[1, 0]
[2, 1/2]
[3, 4]
[4, 39]
[5, 472]
[6, 6900]
[7, 118716]
[8, 2354072]
[9, 52911216]
[10, 1330107840]
[11, 36991592500]
[12, 1127914077312]
[13, 37420777559496]
[14, 1342183358856704]
[15, 51756244887797100]
[16, 2135359495833676800]
[17, 93864296121282210784]
[18, 4379542774345464496128]
[19, 216178594161376244063076]
[20, 11255374377126199463936000]
[21, 616463053079082019376239800]
[22, 35432440194603405959506034688]
[23, 2132478311049609494982190450204]
[24, 134116927093400952330702474706944]
[25, 8798258310785305861553627142930000]
[26, 601017143689398400881632598032384000]
[27, 42684847106441394367226307718311565716]
[28, 3147222817221577402622824207375266742272]
[29, 240582153893938356991908848927905622445736]
[30, 19043079550271688145972837306025115648000000]
[31, 1558981284930199828739239320361571788041021900]
[32, 131855889346498739861328689280063889600321421312]
[33, 11509801310013312943392059948175963018705195688896]
[34, 1035923337749909421439098617643978496876513679376384]
[35, 96047358406199526246797502389944873251740366086562500]
[36, 9165799239775410749318809193562746250766254788837376000]
[37, 899564153243436548625817312320806272340082850282897445784]
[38, 90727638906463992065814103522957255273344987915765288009728]
[39, 9396779387234810402125876063643842517670905874506382252419196]
[40, 998740925886849063603687252012149942602287367747692134400000000]
Related Question