Bounding tails of sum of not identical random variables

concentration-of-measureprobabilityprobability theory

Let $\{X_i\}_{i=1}^n$ be a set of $n$ statistically independent random variables such that $\Pr(X_i=c_1)=\alpha/i$, and $\Pr(X_i=c_2/i)=1-\alpha/i$, and the constants $c_1<0,c_2>0$, and $\alpha>0$, are such that the expectation of each random variable $\mathbb{E}(X_i) \leq -\beta/i$, for some $\beta>0$. I want to upper bound the following probability
$$
\Pr\left(\sum_{i=1}^nX_i>C\right),
$$

for some value of $C>0$. I am also interested in upper bounding
$$
\Pr\left(\max_{1\leq j\leq n}\sum_{i=1}^jX_i>C\right).
$$

Note that $\sum_{i=1}^n\mathbb{E}(X_i) = -\beta\sum_{i=1}^n1/i$, and so it diverges to $-\infty$ logarithmically in $n$. Since the random variables are highly non-identical I am not sure how to derive meaningful upper bound on these probabilities. I'm interested in any non-trivial (i.e., strictly less than $1$). I tried to use Hoeffding's inequality but it seems that it is not the "right tool" to use in this scenario.

Best Answer

The classical method of exponentiating before taking Markov inequality gives a better bound for the tail of $S_n$, polynomial in $n$. To wit:

According to the question,

$$E[X_n] = c_1 \frac{\alpha}{n} + \frac{c_2}{n} (1-\frac{\alpha}{n}) \sim \frac{c_1 \alpha+c_2}{n}$$

and I will set $\gamma =c_1 \alpha + c_2$, a quantity that is negative by your assumption.

Let $\beta >0$ be a parameter that we will be chosen after. By Markov inequality:

$$P[S_n >x] = P[e^{\beta S_n} >e^{\beta x}] \le \frac{E[e^{\beta S_n}]}{e^{\beta x}}$$

Now,

\begin{align*} E[e^{\beta X_n}]& =e^{\beta c_1} \frac{\alpha}{n} + e^{\beta\frac{c_2}{n}}(1-\frac{\alpha}{n})\\ & = (1 + (e^{\beta c_1}-1)) \frac{\alpha}{n} + (1+ \beta \frac{c_2}{n} + O(\frac{1}{n^2}))(1-\frac{\alpha}{n}) \\ & = 1 + \frac{1}{n} ((e^{\beta c_1}-1) \alpha+ \beta c_2 ) + O(\frac{1}{n^2}) \end{align*}

Now consider: $$\varphi: \beta \mapsto (e^{\beta c_1}-1) \alpha+ \beta c_2 $$

For $\beta$ small, we have $\varphi(\beta)=\beta (c_1 \alpha + c_2) +O(\beta^2)= \gamma \beta + c_2 +O(\beta^2)$, and we said $\gamma<0$; this proves the function has a negative derivative at $0$, hence its minimum, attained at $\beta_0>0$, is strictly negative: call it $$\varphi(\beta_0)<0.$$

Then for some constant $C<\infty$ (to incorporate the products of the remainders in $O(1/n^2))$,

$$P[S_n >x] = \frac{E[e^{\beta_0 S_n}]}{e^{\beta_0 x}} \le C e^{\varphi(\beta_0) \log(n) -\beta_0 x} =C n^{\varphi(\beta_0)} e^{-\beta_0 x},$$

which gives you a polynomial decay, that should be close to the truth (reasoning heuristically). You also have for free the exponential decay in $x$.

--

As for the max,

$$M_n =\Big(\frac{e^{\beta_0 S_n}}{E[e^{\beta_0 S_n}]}\Big)_n$$ is a positive martingale. By the above, we may write $$E[e^{\beta_0 S_n}]= C_n n^{\varphi(\beta_0)}$$ for a sequence $C_n$ converging to $C \in (0,\infty)$. Now, Doob's martingale inequality gives:

$$P(\max_{m =0...n} M_m \ge x)\le \frac{E[M_0]}{x}$$

which gives in our case:

$$P(\max_{m =0...n} \frac{e^{\beta_0 S_n}}{C_n n^{\varphi(\beta_0)}} \ge x)\le \frac{1}{x}$$

In particular (this is much weaker!),

$$ P(\max_{m =0...n} e^{\beta_0 S_m} \ge x \max C_m)\le \frac{1}{x}$$

meaning:

$$ P( \max_{m =0...n} S_m \ge \frac{\log(x \max C_m)}{\beta_0}) \le \frac{1}{x}$$

or

$$ P( \max_{m =0...n} S_m \ge x)\le (\max C_m) e^{-\beta_0 x}$$

and the right hand side does no more depend on $n$, so it is indeed a bound on the max over the whole path.

--

It has to be checked there is no hidden dependence in the constant $C$ above : I don't think so, but I have not checked carefully.