Empirical Rademacher complexity bound

machine learningstatistics

Consider the hypothesis class
$$\mathcal{H} = \{\mathcal{X}\ni x\to r^2-\|\Phi(x)-\mathbf{c}\|^2:\|\mathbf{c}\|\leq \Lambda, 0<r\leq R\}$$

where $\Phi: \mathcal{X} \to \mathbb{H}$ is a feature map associated to some kernel $K$ (that is $K(x,x')=\langle \Phi(x), \Phi(x')\rangle$). Now let $S = (x_1,
\dots, x_m)$
be a sample from $\mathcal{X}$. I would like to prove that the empirical Rademacher complexity can be bounded by
$$\hat{\mathfrak{R}}_S(\mathcal{H})\leq \frac{R^2+\Lambda^2}{\sqrt{m}}+\Lambda\sqrt{\text{Tr}[\mathbf{K}]}$$
where $\mathbf{K} = [K(x_i, x_j)]_{i,j}$ is the kernel matrix constructed from the sample $S$.

The empirical Rademacher complexity is defined as
$$\hat{\mathfrak{R}}_S(\mathcal{H})=\frac{1}{m}\mathbb{E}_{\mathbf{\sigma}}\left[\sup_{h \in \mathcal{H}} \sum_{i=1}^m \sigma_ih(x_i)\right]$$
where $\sigma_i$ are i.i.d random variables such that $P[\sigma_i = +1]= P[\sigma_i = -1] = 0.5$.

My attempt: We want to calculate
$$\hat{\mathfrak{R}}_S(\mathcal{H})=\frac{1}{m}\mathbb{E}_{\mathbf{\sigma}}\left[\sup_{\|\mathbf{c}\|\leq \Lambda\\0<r\leq R} \sum_{i=1}^m \sigma_i (r^2-\|\Phi(x_i)-\mathbf{c}\|^2)\right]$$

We can then rewrite
$$r^2-\|\Phi(x_i)-\mathbf{c}\|^2 = r^2 – \|c\|^2-\|\Phi(x_i)\|^2 + 2\langle\Phi(x_i), \mathbf{c}\rangle$$

The first and second terms will yield bounds of $mR^2$ and $m\Lambda^2$ respectively, the third term will vanish since $\mathbb{E}[\sigma_i]=0$ while last term will give a bound of $2\Lambda\sqrt{\text{Tr}[\mathbf{K}]}$ by using Cauchy-Schwartz for the inner product and Jensen's inequality when taking expectation. However, when putting everything together this gives:

$$\hat{\mathfrak{R}}_S(\mathcal{H}) \leq R^2 + \Lambda^2 + \frac{2\Lambda}{m}\sqrt{\text{Tr}[\mathbf{K}]} $$

which differs from what I want to prove.

Any help is appreciated!

Best Answer

You can get an improved bound proveeding this way: $\hat{\mathcal{R}}_S(\mathcal{H}) \leq \frac{1}{m}E\left[|\sum_{i=1}^{m}\sigma_i|\sup_{c, r}|r^{2} - \lVert c\rVert^{2}|\right] + \frac{2}{m}E\left[\sup_{c}\sum_{i=1}^{m}\sigma_i\langle c, \Phi(x_i)\rangle\right] \leq \frac{R^{2}+\Lambda^{2}}{m}E\left[|\sum_{i=1}^{m}\sigma_i|\right] + \frac{2}{m}E\left[\sup_{c}\sqrt{\sum_{i=1}^{m}\sigma_i^{2}}\sqrt{\sum_{i=1}^{m}\langle c, \Phi(x_i)\rangle^{2}}\right] \leq \frac{R^{2}+\Lambda^{2}}{\sqrt{m}} + \frac{2}{\sqrt{m}}\Lambda\sqrt{Tr(K)}$

This bound is somewhat better than the one you seek.

Related Question