Maximum distance between samples with equal value in a sequence of i.i.d. discrete samples

probabilityprobability theorysequences-and-series

Let $\left(X_i\right)_{i=1}^n$ be a sequence of i.i.d. samples with discrete outcome space $S=\{s_1,…,s_k\}, k<\infty$, $s_i \in \mathbb{R}$ with respective probabilities $p_1,…,p_k$. Define the maximum distance between two samples with equal value as
\begin{equation}
D_n = \max_{i=1,…,n}\{\min_{j>i}\{|i-j|:X_i=X_j\}\},
\end{equation}

where we simply take $n-i$ if $X_i$ is the last sample in the sequence with its value. I want to show if it holds that
\begin{equation}
\mathbb{P}(D_n \le \varepsilon n) \xrightarrow{n\to\infty}1, \forall \varepsilon >0.
\end{equation}

My idea of why this holds is that as the number of samples till we see a sample with value $s_i$ is Geometrically distributed with parameter $p_i$. As all the $X_i$ are i.i.d. it follows that for $n$ large enough that the number of samples in the sequence with value $s_i$ is $p_i n$. We know that, given that we have $p_i n$ samples with value $s_i$, the maximum of a sequence of Geometrically distributed samples converges a.s. to $\frac{\log(p_i n)}{\log(1/(1-p_i)}$. If the maximum for every value was independent of the maximum of the other values it would follow that
\begin{equation}
D_n \xrightarrow{a.s.}\max\{\frac{\log(p_i n)}{\log(1/(1-p_i)},i=1,…k\}.
\end{equation}

As the order of growth for $D_n$ is $\log(n)$ it follows that for $n$ large enough that $D_n < \varepsilon n, \forall \varepsilon > 0$, such that
\begin{equation}
\mathbb{P}(D_n \le \varepsilon n) \xrightarrow{n\to\infty}1, \forall \varepsilon >0.
\end{equation}

The problem is that the maximum of the values are dependent of each other. I am having real trouble finding how to approach this problem and can't find anything on the internet. Any advice how to solve this or approach this will be greatly appreciated as this would complete the proof of a theorem I am working on.

Best Answer

Let $X_{j}$ indicate the event that there does not exist $k\in\left[j+1,\dots,j+\epsilon n\right]$ such that $s_{j}=s_{k}$. We have:

$$p:=\Pr\left[X_{j}=1\right]= \sum_{i=1}^{k}p_i\left(1-p_{i}\right)^{\epsilon n}$$

Let $X=\sum_{j=1}^{n}X_{j}$, then $$\mathbb{E}\left[X\right]=np$$

which approaches zero whenever $p = o(n)$. To obtain a bound for $p$, you will need to make assumptions on your probabilities.

Related Question