Asymptotic Distance Between First Occurrences of Distinct Letters in Multiset Permutations

asymptoticscombinatoricscoupon-collectordiscrete mathematicsprobability

Consider a multiset consisting of $h$ copies of each of $n$ distinct letters. To each permutation, we assign a label, $\vec{T}$, whose components, $t_i$ are the positions (1-indexed) in the permutation of the first occurrence of the $i^{th}$ distinct letter. Further, consider the label $\vec{\Delta}$ defined by $\Delta_1 = t_1$, $\Delta_i = t_i-t_{i-1}$ for $i>1$. For example, take $n=3$, $h=2$, and call our letters $a,b,c$. The permutation $aabcbc$ would be given the labels $\vec{T}=\langle 1, 3, 4 \rangle$ and thus $\vec{\Delta} = \langle 1, 2, 1 \rangle$. My question is about the average $\vec{\Delta}$ label. That is, if we considered all of the distinct (there will be repeats) $\vec{\Delta}$'s over all possible permutations of our multiset, added them component by component and divided by the total number of distinct $\vec{\Delta}$'s. Note it is not really fair to call this an average because a true average would weight each $\vec{\Delta}$ by the number of times it is repeated over all permutations, and divide by the total number of permutations, but for lack of better language, we will continue to refer to it as an "average". Specifically, if we fix $n$ and let $h\rightarrow \infty$ what is the asymptotic average $\vec{\Delta}$. Numerical calculations lead me to believe that such an answer exists. For $n=3$, and for large $h$ (I've calculated as large as $h=300$) the average $\vec{\overline{\Delta}}\approx\langle 1, .45h, .78h\rangle$, and for $n=4$, $\vec{\overline{\Delta}}\approx\langle 1, .43h, .67h, .95h\rangle$.

This question is, while not obviously, related to the Coupon Collector's problem. I was wondering what mathematical machinery would be useful for dealing with this problem. My hope is some closed form expression for the coefficients of $h$ in the $\vec{\overline{\Delta}}$'s. Immediately generating functions came to mind, but I'm unsure how to apply them to these odd $\vec{\Delta}$ sequences. Any guidance is greatly appreciated.

Best Answer

This would indeed be related to the coupon collector’s problem if you were interested in the actual average, but finding the “unweighted” average you want is a simpler combinatorial problem.

First, note that the transformation from $\vec T$ to $\vec\Delta$ is linear and bijective. Thus, distinct values of $\vec T$ correspond to distinct values of $\vec\Delta$, and the transformation commutes with the average; so we can average $\vec T$ and only then transform to $\vec\Delta$.

Clearly $t_1$ is always $1$. We have $2\le t_2\le h+1$, since the second distinct letter must appear at the latest after all $h$ copies of the first letter. Likewise, $t_2+1\le t_3\le2h+1$, since the third distinct letter must appear at the latest after all $2h$ copies of the first two distinct letters. Thus, the number of distinct values of $\vec T$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}1 =\frac12 h(3h-1)\;, $$

the sum of $t_2$ over all distinct values of $\vec T$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}t_2 =\frac23h(h^2+3h-1)\;, $$

and the sum of $t_3$ over all distinct values of $\vec T$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}t_3 =\frac16h(11h^2+12h-5)\;. $$

So asymptotically for $h\to\infty$, we have $\vec{\overline T} \sim\left(1,\frac{\frac23h^3}{\frac32h^2},\frac{\frac{11}6h^3}{\frac32h^2}\right)=\left(1,\frac49h,\frac{11}9h\right)$, and thus $\vec{\overline\Delta}\sim(1,\frac49h,\frac79h)$, in agreement with your results.

We can do the same thing for $n=4$; it's just slightly more tedious. As before, we have $t_3+1\le t_4\le3h+1$, so the number of distinct values of $\vec T$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}\sum_{t_4=t_3+1}^{3h+1}1 =\frac13h(8h^2-6h+1)\;, $$

the sum of $t_2$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}\sum_{t_4=t_3+1}^{3h+1}t_2 =\frac1{24}h(27h^3+74h^2-63h+10)\;, $$

the sum of $t_3$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}\sum_{t_4=t_3+1}^{3h+1}t_3 =\frac1{12}h(35h^3+30h^2-35h+6)\;, $$

and the sum of $t_4$ is

$$ \sum_{t_2=2}^{h+1}\sum_{t_3=t_2+1}^{2h+1}\sum_{t_4=t_3+1}^{3h+1}t_4 =\frac1{24}h(131h^3+22h^2-71h+14)\;. $$

Thus, asymptotically for $h\to\infty$ we have $\vec{\overline T}\sim\left(1,\frac{\frac{27}{24}h^4}{\frac83h^3},\frac{\frac{35}{12}h^4}{\frac83h^3},\frac{\frac{131}{24}h^4}{\frac83h^3}\right)=\left(1,\frac{27}{64}h,\frac{35}{32}h,\frac{131}{64}h\right)$, corresponding to $\vec{\overline\Delta}\sim\left(1,\frac{27}{64}h,\frac{43}{64}h,\frac{61}{64}h\right)$, also in agreement with your results.

Related Question