Probability – Non-i.i.d. Bernoulli Random Variables

probabilitystatistics

Let $X_1, X_2, \dots, X_n \sim \operatorname{Bernoulli}(p)$ and $\bar{X}=\frac{1}{n}\sum\limits_{i=1}^nX_i$. Find an upper bound for $\mathbb{P}(|\bar{X}-p| > \epsilon) \forall \epsilon > 0$. Note that random variables are not independent.

$\textbf{What I've tried:}$

We know that $\mathbb{E}[\bar{X}] = \mathbb{E}[X_1] = p$. So we can use Chebyshev's inequality because $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(|\bar{X}-\mathbb{E}[\bar{X}]| > \epsilon) \le \mathbb{P}(|\bar{X}-\mathbb{E}[\bar{X}]| \ge \epsilon) \stackrel{Chebyshev}{\le} \frac{\operatorname{Var}(\bar{X})}{\epsilon^2} \le \frac{\operatorname{Var}(\sum\limits_{i=1}^n X_i)}{n^2\epsilon^2} \le \frac{n \sum\limits_{i=1}^n \operatorname{Var}(X_i)}{n^2\epsilon^2} \le \frac{n^2 \operatorname{Var}(X_1)}{n^2\epsilon^2} = \frac{p(1-p)}{\epsilon^2}$

So, $\frac{p(1-p)}{\epsilon^2}$ is an upper bound.

Is my proof true? Thanks!

Best Answer

Your calculated upper bound is an upper bound and your argument looks reasonable, but it is not the tightest possible.

If the $X_i$ can have any dependency, with large enough $n$ you can construct $\bar X$ to approximate any distribution on $[0,1]$ which has expectation $p$. There are three ranges of values of $\epsilon$ to consider:

$0< \epsilon < \min(p, 1-p)$: you can concentrate the distribution of $\bar X$ outside the interval $(p - \epsilon, p+\epsilon)$ so you can only say (as with any probability) $$\mathbb{P}(|\bar{X}-p| > \epsilon) \le 1$$
$\max(p, 1-p) < \epsilon$: in which case $p - \epsilon < 0$ and $p + \epsilon > 1$ and so you can say $$\mathbb{P}(|\bar{X}-p| > \epsilon) =0$$
$\min(p, 1-p) < \epsilon < \max(p, 1-p)$: you are only interested in one tail and if $0 < p< \frac12$ then $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(\bar{X}>p +\epsilon) \le \frac{\mathbb E[\bar X]}{p+\epsilon}= \frac{p}{p+\epsilon}$ using Markov's inequality, and similarly by symmetry if $\frac12 < p < 1$ then $\mathbb{P}(|\bar{X}-p| > \epsilon) = \mathbb{P}(\bar{X}< 1-p -\epsilon) \le \frac{1-p}{1-p+\epsilon}$, which you can combine to say $$\mathbb{P}(|\bar{X}-p| > \epsilon) \le \frac{1}{1+\frac{\epsilon}{\min(p,1-p)}}$$

As an illustration, here are the two upper bounds shown for $p=\frac13$, with yours in red and mine in blue.

Jessie asked for the code to draw this. Using R:

boundA <- function(p,epsilon){
  ifelse(epsilon < p & epsilon < 1-p, 1, 
  ifelse(epsilon > p & epsilon > 1-p, 0, 
  1 / (1 + epsilon / ifelse(p < 1-p, p, 1-p))))  
  }   
boundB <- function(p,epsilon){
  p*(1-p) / epsilon^2
  }   

pstring <- "1/3"
p <- eval(parse(text=pstring))
curve(boundA(p,x), from=0, to=1.15, ylim=c(0,2), col="blue", 
      n=50001, xlab="epsilon", ylab="upper bound on probability",
      main=paste(c("p = ", pstring), collapse=""))
curve(boundB(p,x), from=0, to=1.15, col="red", add=TRUE)

Best Answer

Related Solutions

[Math] Expected value and variance of ratio of two sums of two sets of random variables

[Math] Moment generating function of sample mean of bernoulli random variables

Related Question