Let $X_1, X_2, \ldots$ be i.i.d. Bernoulli random variables with mean $\mu$, and $\hat{\mu}_n = \frac{1}{n}\sum_i X_i$. Then the Chernoff-Hoeffding bound says that for all $c > 0$,
$$
P\left(\hat{\mu}_n – \mu > c\right) \leq e^{-2 n c^2}.
$$
Now, is there any similar bound for $KL(\hat{\mu}_n||\mu)$ instead of $(\hat{\mu}_n – \mu)$? [NOTE: $KL(\hat{\mu}_n||\mu)$ means the KL-divergence between Bernoulli($\hat{\mu}_n$) and Bernoulli($\mu$).]
One thing I know is that for Bernoulli, $KL(p||q) \geq 2(p-q)^2$, and hence,
$$
P\left(\hat{\mu}_n – \mu > c\right) \leq P\left(KL(\hat{\mu}_n || \mu) > 2c^2\right),
$$
but it doesn't help.
Thanks.
Best Answer
I don't follow the question completely; in particular, the expression $\newcommand{\KL}{\mathrm{KL}}$ $\KL(\hat \mu \| \mu)$ does not make sense to me because $\mu$ is a real number rather than a distribution. (Edit: The question is now edited.)
Nevertheless I will mention that Sanov's theorem seems to be related to what the OP has in mind. Suppose that the true distribution is $q$ and the empirical distribution is $\hat p$. Sanov's theorem allows us to measure the KL divergence between $\hat p$ and $q$; in fact, it does a bit more, as we will see. (I am quoting the theorem from the wikipedia page.)
The OP's question then follows as a
Proof. Apply Sanov's theorem with $A$ defined as the set of distributions $p$ such that $\KL(p \| q) \geqslant \delta$.