Monte Carlo – Estimation When Function Value Can Only Be Estimated Indirectly

conditional-expectationestimationmonte carlovariance

I have the following setup.

I want to Monte Carlo estimate a big sum $$F=\sum_x p(x) f(x)$$ by drawing $\\{ x_1,\dots, x_M \\}$ from the distribution $p(x)$ and averaging $f(x_i)$.
However, in my case $$f(x) = G( \alpha(x))$$ where $\alpha(x) \in [-1,1]$ and $G$ is a known bounded but non-linear function $ 0 \leq G(a)\leq 1$ for $a\in [-1,1]$.

The issue is that I cannot compute $\alpha(x)$ exactly for a given $x$. Instead, I can only estimate these quantities $\alpha(x)$ from a finite number of samples. To be more precise, for a given $x$ and corresponding $\alpha(x)$, I will draw $K$ Bernoulli random variables $A^{(x)}_1,\dots, A^{(x)}_K$ with $A^{(x)}_i \in \\{ -1, +1 \\}$ from a distribution such that $\mathbb{E} A^{(x)}_i = \alpha(x)$ to estimate $\alpha(x)$ via

$$\hat{\alpha}(x) = \frac{1}{K}\sum_{j=1}^K A^{(x)}_j$$

Hence, my final estimator is given by

\begin{align}
\hat{F} &= \frac{1}{M} \sum_{i=1}^M G \left( \hat{\alpha}(x) \right) \\\\
&= \frac{1}{M} \sum_{i=1}^M G \left(\frac{1}{K}\sum_{j=1}^K A^{(x)}_j \right)
\end{align}

My question is:

  1. Is $\hat{F}$ even an unbiased estimator of $F$? given that $G$ is non-linear, it doesnt seem like it?
  2. Say I wanted to show that with high probability $|\hat{F} – F| <\epsilon$ when drawing enough samples, i.e. choosing $K$ and $M$ large enough. How would you approach this? Can we control the variance of $\hat{F}$?

Best Answer

About the second question, you can use Hoeffding's inequality twice. From this inequality, using $K=\frac{2}{\epsilon^2}\log \frac{2}{\delta}$ samples,

\begin{equation} \mathbb{P}\left( |\hat{\alpha}(x)-\alpha(x)|\geq \epsilon\right)\leq \delta \end{equation}

Next, assuming that the function $G$ is Lipschitz continuous with Lipschitz constant $L$ (In case the function is not Lipschitz continuous, you can try other types of continuities, for instance checking if the function is Hölder continuous). Therefore, for any set of $M$ points $\{ x_i \}_{i=1}^M$,

\begin{equation} \left|\frac{1}{M}\sum_{i=1}^M G(\alpha(x_i))- \frac{1}{M}\sum_{i=1}^M G(\hat{\alpha}(x_i))\right|\leq L \epsilon \end{equation}

with probability at least $1-M\delta$. This follows from the union bound. Similarly, using Hoeffding's inequality,

\begin{equation} \mathbb{P}\left( \left|\frac{1}{M}\sum_{i=1}^M G(\alpha(x_i))-\mathbb{E}[G(\alpha(x))]\right|\geq \epsilon_2\right)\leq \delta_2 \end{equation}

where $M=\frac{1}{2\epsilon_2^2} \log \frac{2}{\delta_2}$. Finally, combining both results,

\begin{equation} \left|\frac{1}{M}\sum_{i=1}^M G(\hat{\alpha}(x_i))-\mathbb{E}[G(\alpha(x))]\right|\leq \epsilon_2+L \epsilon \end{equation}

with probability at least $1-\delta M -\delta_2$. For example, you can take $\delta_2=\delta_0/2$, $\delta=\delta_0/2M$, $\epsilon_2=\epsilon_0/2$, and $\epsilon=\epsilon_0/(2L)$, yielding

\begin{equation} \left|\frac{1}{M}\sum_{i=1}^M G(\hat{\alpha}(x_i))-\mathbb{E}[G(\alpha(x))]\right|\leq \epsilon_0 \end{equation}

with probability at least $1-\delta_0$, where $M=\frac{2}{\epsilon_0^2}\log \frac{4}{\delta_0}$, and $K=\frac{8L^2}{\epsilon_0^2} \log \frac{4M}{\delta_0}$.