Concentration bound on function of the estimate

concentration-of-measureinformation theoryprobabilityprobability theorystatistics

I know that through concentration inequalities (e.g. Hoeffding, Mcdiarmids, Chernoff) it is possible to have concentration bounds on the empirical mean $\hat{\mu}(N) = \frac{1}{N} \sum_{i = 1}^N X_i$ of $N$ i.i.d. Random Variables $X_1,…,X_N$ of the type
$$
\mathbb{P}\left[\mathbb{E}[X]-\hat{\mu}(N) \geq \varepsilon \right] \leq 1-\delta
$$

I want to find a concentration bound on $f(\hat{\mu}(N))$
$$
\mathbb{P}\left[\mathbb{E}[f(X)]-f(\hat{\mu}(N)) \geq \varepsilon \right] \leq 1-\delta
$$

where $f(\cdot)$ is a non-linear function, possibly with smoothness assumption (e.g. $L$-Lipschitz). Any ideas on how to do it?

Best Answer

Below is a back-of-envelop computation. Someone else should please check that I haven't screwed anything. Also, I'm not certain we need to impose tail bounds on $X$ (e.g sub-Gaussianity) as I've done below...


So lat $X$ be a random vector in $\mathbb R^p$ (the scalar case corresponds to $p=1$) with mean $E[X] = \mu \in \mathbb R^p$ and covariance matrix $\Sigma$ (a $p$-by-$p$ psd matrix). Let $\hat{\mu}_N := (1/N)\sum_{i=1}^NX_i$, where $X_1,\ldots,X_N$ are $N$ iid copies of $X$.

The triangle inequality, one computes

$$ \begin{split} |E[f(X)] - f(\hat{\mu}_N)| &= |E[f(X)] - f(X) + f(X) - f(\mu) + f(\mu) - f(\hat{\mu}_N)|\\ &\le |E[f(X)] - f(X)| + |f(X) - f(\mu)| + |f(\mu)-f(\hat{\mu}_N)| \end{split} \tag{*} $$

Now,

Suppose $X$ is $\sigma$-subGaussian and $f$ is $L$-Lipschitz.

Then, with probability $\ge 1 - \delta$, it holds that $$ \begin{split} (1/L)|f(\mu)-f(\hat{\mu}_N)| &\le \|\mu-\hat{\mu}_n\| \le \sqrt{\frac{\text{Tr}(\Sigma)}{N}} + \sqrt{\frac{2\lambda_\max(\Sigma)\log(1/\delta)}{N}}\\ &= \mathcal O(\sqrt{d\log(1/\delta)/N}). \end{split} $$

Thus by (*), asymptotically in $N$, we have

$$ \begin{split} P(|E[f(X)] - f(\hat{\mu}_N)| > \epsilon) &\le P(|E[f(X)] - f(X)| > \frac{\epsilon}{3}) + P(|f(X) - f(\mu)| > \frac{\epsilon}{3})\\ &\quad\quad + P(|f(\mu)-f(\hat{\mu}_N)| > \frac{\epsilon}{3})\\ &\le A_1e^{-C_1\epsilon^2} + A_2e^{-C_2\epsilon^2} + \mathcal O(1/\sqrt{N}), \end{split} $$

for some positive absolute constants $A_1,A_2,A,C_1,C_2,C$.

The first term in the above bound is by concentration of Lipschitz functions of subGaussian vectors and and the second is because $|f(X) - f(\mu)| \le L\|X-\mu\|$ and using (again!) the subGaussianity of $X$.

Related Question