Probability – Non-i.i.d. Bernoulli Random Variables

probabilitystatistics

Let $X_1, X_2, \dots, X_n \sim \operatorname{Bernoulli}(p)$ and $\bar{X}=\frac{1}{n}\sum\limits_{i=1}^nX_i$. Find an upper bound for $\mathbb{P}(|\bar{X}-p| > \epsilon) \forall \epsilon > 0$. Note that random variables are not independent.

$\textbf{What I've tried:}$

We know that $\mathbb{E}[\bar{X}] = \mathbb{E}[X_1] = p$. So we can use Chebyshev's inequality because $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(|\bar{X}-\mathbb{E}[\bar{X}]| > \epsilon) \le \mathbb{P}(|\bar{X}-\mathbb{E}[\bar{X}]| \ge \epsilon) \stackrel{Chebyshev}{\le} \frac{\operatorname{Var}(\bar{X})}{\epsilon^2} \le \frac{\operatorname{Var}(\sum\limits_{i=1}^n X_i)}{n^2\epsilon^2} \le \frac{n \sum\limits_{i=1}^n \operatorname{Var}(X_i)}{n^2\epsilon^2} \le \frac{n^2 \operatorname{Var}(X_1)}{n^2\epsilon^2} = \frac{p(1-p)}{\epsilon^2}$

So, $\frac{p(1-p)}{\epsilon^2}$ is an upper bound.

Is my proof true? Thanks!

Best Answer

Your calculated upper bound is an upper bound and your argument looks reasonable, but it is not the tightest possible.

If the $X_i$ can have any dependency, with large enough $n$ you can construct $\bar X$ to approximate any distribution on $[0,1]$ which has expectation $p$. There are three ranges of values of $\epsilon$ to consider:

  • $0< \epsilon < \min(p, 1-p)$: you can concentrate the distribution of $\bar X$ outside the interval $(p - \epsilon, p+\epsilon)$ so you can only say (as with any probability) $$\mathbb{P}(|\bar{X}-p| > \epsilon) \le 1$$

  • $\max(p, 1-p) < \epsilon$: in which case $p - \epsilon < 0$ and $p + \epsilon > 1$ and so you can say $$\mathbb{P}(|\bar{X}-p| > \epsilon) =0$$

  • $\min(p, 1-p) < \epsilon < \max(p, 1-p)$: you are only interested in one tail and if $0 < p< \frac12$ then $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(\bar{X}>p +\epsilon) \le \frac{\mathbb E[\bar X]}{p+\epsilon}= \frac{p}{p+\epsilon}$ using Markov's inequality, and similarly by symmetry if $\frac12 < p < 1$ then $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(\bar{X}< 1-p -\epsilon) \le \frac{1-p}{1-p+\epsilon}$, which you can combine to say $$\mathbb{P}(|\bar{X}-p| > \epsilon) \le \frac{1}{1+\frac{\epsilon}{\min(p,1-p)}}$$

As an illustration, here are the two upper bounds shown for $p=\frac13$, with yours in red and mine in blue.

upper bound on probability


Jessie asked for the code to draw this. Using R:

boundA <- function(p,epsilon){
  ifelse(epsilon < p & epsilon < 1-p, 1, 
  ifelse(epsilon > p & epsilon > 1-p, 0, 
  1 / (1 + epsilon / ifelse(p < 1-p, p, 1-p))))  
  }   
boundB <- function(p,epsilon){
  p*(1-p) / epsilon^2
  }   

pstring <- "1/3"
p <- eval(parse(text=pstring))
curve(boundA(p,x), from=0, to=1.15, ylim=c(0,2), col="blue", 
      n=50001, xlab="epsilon", ylab="upper bound on probability",
      main=paste(c("p = ", pstring), collapse=""))
curve(boundB(p,x), from=0, to=1.15, col="red", add=TRUE)