Concentration Bound on Sum of Independent Random Variables

probability theoryrandom variables

[Updated with the solution I have so far]

Suppose I have a collection of independent random variables:, $X_1, \dots, X_N$, and I want to analyze the weighted sum $Y = X_1 + \lambda X_2 + \dots + \lambda^{N-1}X_N$ for some $\lambda \in (0, 1)$. I know sub-Gaussian concentration bounds for all of the random variables, i.e. $P[|X_i| \geq t] \leq \exp(-\frac{t^2}{2\sigma^2})\ \forall t \geq 0$. I also know that all $X_i$ are independent of one another.

Note that I cannot assume any of the $X_i$ are zero-mean. (Actually, for my specific application, they're strictly nonnegative.)

I'm trying to derive a sub-Gaussian concentration bound for $Y$, ideally one that stays bounded in the limit as $N \to \infty$. I have a solution based on Proposition 2.5.2 in [1] (see below), but it's pretty complicated. Intuitively, I feel like this should be a simple problem, and I'd like to know whether anyone has a better way of doing this (perhaps with a tighter bound).

I did come across this post, which might be relevant. I'm not super well-versed in probability theory, though (most of what I've said so far is stuff I've learned in the past few hours), and I have no idea how to work with the accepted answer.

My answer using Proposition 2.5.2 in [1] (again, this is a pretty complicated solution to what I feel like should be a simple problem, and this is probably a fairly loose bound):

  1. Preliminary facts I'll make use of: $1 + \lambda + \cdots + \lambda^{N-1} = \frac{1 – \lambda^N}{1 – \lambda}$ and $Y^2 = (X_1 + \lambda X_2 + \cdots + \lambda^{N-1}X_N)^2 \leq \frac{1 – \lambda^N}{1 – \lambda}(X_1^2 + \lambda X_2^2 + \cdots + \lambda^{N-1}X_N^2)$
  2. From the bound on $P[|X_i| \geq t]$, using $\mathbb{E}[|X_i|^p] = \int_0^\infty P[|X_i|^p \geq u]du$, a couple changes of variables, and Stirling’s approximation, we can derive $\mathbb{E}[X_i^p]^{1/p} \leq \sigma e^{1/(2e)}\sqrt{p}$
  3. From the bound on $\mathbb{E}[X_i^p]^{1/p}$, using the Taylor series expansion of $\exp(s^2X_i^2)$ and Stirling's approximation, we can derive $\mathbb{E}[\exp(s^2X_i^2)] \leq \exp(4\sigma^2e^{1+(1/e)}s^2)\ \forall s^2 \leq \frac{1}{4\sigma^2e^{1+(1/e)}}$
  4. We can write $\mathbb{E}[\exp(v^2Y^2)] \leq \mathbb{E}[\exp(\frac{1 – \lambda^N}{1 – \lambda}v^2X_1^2)]\mathbb{E}[\exp(\frac{\lambda(1 – \lambda^N)}{1 – \lambda}v^2X_2^2)]\cdots\mathbb{E}[\exp(\frac{\lambda^{N-1}(1 – \lambda^N)}{1 – \lambda}v^2X_N^2)]$ using the preliminary bound on $Y^2$ and independence of $X_i$
  5. Using the bound on $\mathbb{E}[\exp(s^2X_i^2)]$, we can derive $\mathbb{E}[\exp(v^2Y^2)] \leq \exp(4\sigma^2e^{1+(1/e)}(\frac{1 – \lambda^N}{1 – \lambda})^2v^2)\ \forall v^2 \leq \frac{1}{4\sigma^2e^{1+(1/e)}}$
  6. It follows immediately that $\mathbb{E}[\exp(Y^2/K^2)] \leq 2\ \forall K^2 \geq \frac{4\sigma^2e^{1+(1/e)}}{\ln(2)}(\frac{1 – \lambda^N}{1 – \lambda})^2$
  7. Finally, from the previous fact and setting $K^2 = \frac{4\sigma^2e^{1+(1/e)}}{\ln(2)}(\frac{1 – \lambda^N}{1 – \lambda})^2$, we can derive $P[|Y| \geq t] \leq 2\exp(-\frac{t^2}{K^2})\ \forall t \geq 0$

[1] Vershynin, Roman, High-dimensional probability. An introduction with applications in data science, Cambridge Series in Statistical and Probabilistic Mathematics 47. Cambridge: Cambridge University Press (ISBN 978-1-108-41519-4/hbk; 978-1-108-23159-6/ebook). xiv, 284 p. (2018). ZBL1430.60005.

Best Answer

Alright, I came back to this after a while, and PhoemueX's comment on my question totally works. To add more details, Vershynin's book defines the sub-Gaussian norm as $$\|X\|_{\psi_2} = \inf\{t > 0 : \mathbb{E}[\exp(X^2/t^2)] \leq 2\}$$ and it's left as an exercise to prove that it's a norm on the space of sub-Gaussian random variables. Given that it is a norm, though, it satisfies the triangle inequality, so we can simply write $$\|Y\|_{\psi_2} \leq \|X_1\|_{\psi_2} + \lambda\|X_2\|_{\psi_2} + \cdots + \lambda^{N-1}\|X_N\|_{\psi_2}.$$ Finally, if $\|X_i\|_{\psi_2} = C\sigma$ for all $i$ and some $C, \sigma > 0$, then we can write $$\|Y\|_{\psi_2} \leq \frac{1 - \lambda^N}{1 - \lambda}C\sigma$$ which using Proposition 2.5.2 in Vershynin's book implies, for some $c > 0$, $$P[|Y| \geq t] \leq 2\exp(-\frac{ct^2}{\|Y\|_{\psi_2}^2}) \leq 2\exp(-(\frac{1 - \lambda}{1 - \lambda^N})^2\frac{ct^2}{C^2\sigma^2}) < 2\exp(-\frac{c(1 - \lambda)^2t^2}{C^2\sigma^2}).$$

Related Question