Probability – Bound Error in Approximating $E_x [H(f(x))]$ with Random Sampling

Let $f:\mathbb R^d \to \mathbb R$ be a "sufficiently smooth" function. For simplicity, we may consider $f$ to be an affine function, i.e $f(x) \equiv b-x^\top w$, for some $(w,b) \in \mathbb R^{d}$. Let $\Phi:\mathbb R \to (0,1)$ be the standard Gaussian CDF defind by
$$
\Phi(t):= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^t e^{-s^2/2}\,dx,
$$
and let $\theta:\mathbb R \to \mathbb R$ be a the Heaviside unit-step funciton defined by

$$
\theta (t) = \begin{cases}
0,&\mbox{ if }t < 0,\\
1/2,&\mbox{ if }t = 0,\\
1,&\mbox{ if }t>0.
\end{cases}
$$

Define a scalar $s \in [0,1]$ by

$$
s:= \mathbb E_X[\theta(f(X))] = \mathbb P(f(X)>0).
$$

Now, let $X$ be a random variable on $\mathbb R^d$ with "sufficiently smooth" probability density function $\rho$, and let $X_1,\ldots,X_n$ be $n$ iid copies of $X$. Given a bandwidth parameter $h=h_n > 0$, define the random variable $\hat{s}_{n,h} \in (0,1)$ by

$$
\hat{s}_{n,h} := \frac{1}{n}\sum_{i=1}^n\Phi\left(\frac{f(X_i)}{h}\right).
$$

Finally, define the random variable $\Delta \in [0, 2)$ by

$$
\Delta_{n,h} := |\hat{s}_{n,h} – s|.
$$

Intuitively, one would expect "$\Delta_{n,h} \to 0$" in the limit when $n \to \infty$ such that $h \to 0$ and $nh^d \to \infty$. My goal is to quantify this convergence by upper-bounding $\Delta_{n,h}$, by a deterministic quantity $\varepsilon_{n,h}$ which goes to zero.

Question 1. Is it possible to give good quantitative upper-bounds for $\Delta_{n,h}$, in terms of $n$ and $h$?, which are valid with high-probability over the $X_i$'s ?

Question 2. Is it even true that $\Delta_{n,h} \to 0$ ?

Best Answer

$\newcommand{\si}{\sigma}$Let \begin{equation*} Y:=f(X),\quad Y_i:=f(X_i), \end{equation*} so that $Y,Y_1,Y_2,\dots$ are iid real-valued random variables (r.v.'s).

Assume that (i) the r.v. $Y$ has a density $p_Y$ continuous at $0$ and (ii) $E|Y|^k<\infty$ for some real $k>0$. Except possibly for entering indirectly into these two conditions -- (i) and (ii), the dimension $d$ or any further specifics concerning $X$ and $f$ will play no role in what follows.

Indeed, then \begin{equation*} s=P(Y>0), \end{equation*} \begin{equation*} \hat s_{n,h}=\frac1n\sum_{i=1}^n\Phi(Y_i/h), \end{equation*} \begin{equation*} E\hat s_{n,h}=E\Phi(Y/h)=:\mu_h. \end{equation*}

Let $n\to\infty$ and $h\downarrow0$. Then $\Phi(Y/h)\to1(Y>0)$ in probability and hence, by dominated convergence, \begin{equation*} \begin{aligned} \mu_h=E\Phi(Y/h)&\to p:=P(Y>0),\\ \si^2_h:=Var\,\Phi(Y/h)&\to pq,\\ a_h:=E|\Phi(Y/h)-\mu_h|^3&\to pq(p^2+q^2), \end{aligned} \end{equation*} where $q:=1-p$; without loss of generality, $0<p<1$. By the Berry--Esseen inequality, for all real $z\ge0$, \begin{equation*} P\Big(\Big|\frac{\hat s_{n,h}-\mu_h}{\si_h/\sqrt n}\Big|>z\Big) \le2(1-\Phi(z))+\frac{a_h/\si_h^{3/2}}{2\sqrt n}, \end{equation*} whence \begin{equation*} |\hat s_{n,h}-\mu_h|=O_P(1/\sqrt n). \tag{1}\label{1} \end{equation*}

Let real $c>0$ vary so that $c\to\infty$ but $ch\to0$. Then \begin{equation*} \begin{aligned} |\mu_h-s|&\le E|\Phi(Y/h)-1(Y>0)| \\ &\le P(|Y|<ch)+(1-\Phi(c))P(|Y|\ge ch) \\ &\lesssim 2p_Y(0)ch+e^{-c^2/(2+o(1))} \frac{E|Y|^k}{(ch)^k}. \end{aligned} \end{equation*} Choosing now $c$ more specifically, so that $c\sim\sqrt{(2k+1)\ln\frac1h}$, we get \begin{equation*} |\mu_h-s|=O\big(h\sqrt{\ln\tfrac1h}\big). \tag{1'}\label{1'} \end{equation*} Thus, in view of \eqref{1} and \eqref{1'},
\begin{equation*} |\hat s_{n,h}-s|=O_P\Big(\frac1{\sqrt n}+h\sqrt{\ln\frac1h}\Big), \tag{2} \end{equation*} as $n\to\infty$ and $h\downarrow0$.

Best Answer

Related Solutions

Beating the 1/?n Rate of Uniform-Convergence Over a Linear Function Class

Related Question