Bayesian Method – How to Use Bayesian Quadrature for Expectation of Unknown Function with Respect to Known PDF

bayesianexpected valuegaussian processnumerical integration

I am interested in estimating the integral $\int f(x) P(x) dx$, where $f(x)$ is an expensive function and $P(x)$ is has an analytic form. I would like to evaluate this with as few evaluations of $f(x)$ as possible, so I am exploring Bayesian Quadrature.

If $P(x)$ is a uniform distribution, the paradigm of Bayesian quadrature makes perfect sense. We want to reduce the uncertainty of a Gaussian Process (GP) model, and we choose the next point to accomplish that. However, for more general $P(x)$, it seems like the best point should depend on the form of $P(x)$.

Is there a way to perform this Bayesian Quadrature that is influenced by the form of $P(x)$?

I have included my best attempt below.

In the example,

$f(x)=\sum_{i=1}^{\mathrm{dim}(x)}x_i^2$,

and $P(x)$ is a Gaussian distribution centered at (1,1) and with unit variance,

$P(x)=\Pi_{i=1}^{\mathrm{dim}(x)}\frac{1}{2\pi}\exp\left(-\frac{(x_i-1)^2}{2}\right)$.

I used the emukit package to perform the Bayesian quadrature according to the link I posted above. I modified the prediction of the GP to multiply the mean prediction by $P(x)$, but I have since realized that the Bayesian Quadrature method only relies on the variance of the output, not on the predicted mean.

Emukit provides a method to quickly compute the mean and variance of the integral, but I can not make it work with $P(x)$ represented outside of the GP. So, I run a large Monte Carlo simulation of the GP model output, using $P(x)$ to draw samples of $f(x)$, to estimate the value of $\int f(x) P(x) dx$.

My issue is that this sampling scheme does not depend at all on $P(x)$. It really seems like there must be a way to inform the Bayesian Quadrature to hone in on samples near (1,1), since this is where the majority of the contributions from the PDF are. However, it seems like there is no real way for me to do that, beyond training the GP model directly on P(x)f(x), which would remove the information about the value of $P(x)$ outside the sampled points when performing the integration.

from emukit.quadrature.methods import VanillaBayesianQuadrature
from scipy.stats import norm
import matplotlib.pyplot as plt
from emukit.quadrature.acquisitions import IntegralVarianceReduction
from emukit.core.optimization import GradientAcquisitionOptimizer
from emukit.core.parameter_space import ParameterSpace
import numpy as np
import GPy
from emukit.model_wrappers.gpy_quadrature_wrappers import BaseGaussianProcessGPy, RBFGPy
from emukit.quadrature.kernels import QuadratureRBFLebesgueMeasure

'''
goal is to find \int f(x) P(x) dx
  - f(x) is function to sample
  - P(x) is known probability function
'''

# define custom regresor class
# - this redirects the prediction of the GPy model,
#   multiplying the predicted f(x) value by P(x)
#  (unfortunately, this doesn't change the behavior of the BQ method)
class myRegressor:

   def __init__(self, X, Y):

      #instantiate GPy model
      self.gpy_model = GPy.models.GPRegression(X=X, Y=Y, kernel=GPy.kern.RBF(
                        input_dim=X.shape[1], lengthscale=1, variance=1.0))

      # fit RBF kernel to data
      self.gpy_model.constrain_bounded(0.1, 1.)
      self.gpy_model.optimize()

      # provide connections for emukit to access
      self.kern = self.gpy_model.kern
      self.Gaussian_noise = self.gpy_model.Gaussian_noise
      self.X = self.gpy_model.X
      self.posterior = self.gpy_model.posterior

   # define prediction function, which multiplies means of GPy prediction by P(x)
   def predict(self, x, full_cov=False, prob=True, Y_metadata=None, kern=None,
    likelihood=None, include_likelihood=True,):
      m, cov = self.gpy_model.predict(x, full_cov, Y_metadata, kern, likelihood, include_likelihood) 
      p = np.atleast_2d(P(x).prod(1)).T
      if prob:
          m *= p
      return m, cov

   # provide set function for emukit
   def set_XY(self, X, Y):
      ret = self.gpy_model.set_XY(X, Y)
      self.X = self.gpy_model.X
      self.posterior = self.gpy_model.posterior
      return ret

DIM = 2
PLOT = False

# define P(x) and f(x)
'''
   specific goal is to find int sum x ** 2 N(x - 1) dx
   where 
     sum sums accross all components of x
     N(x) is the normal distribution
'''
def f(x): return np.atleast_2d(np.sum(x ** 2, 1)).T
def P(x): return norm.cdf((x - 1))

# emukit requires integral bounds. These were selected to be far away from the signifigant contributions of P(x)
lb, ub = (-10, 10)

# sample initial x and f(x) points
np.random.seed(1234)
X_init = np.atleast_2d(np.random.uniform(lb, ub, (DIM, 10))).T
Y_init = f(X_init)

for ii in range(100):

   # fit GP model
   gpy_model = myRegressor(X=X_init, Y=f(X_init))

   # instantial BQ method
   emukit_rbf = RBFGPy(gpy_model.kern)
   emukit_qrbf = QuadratureRBFLebesgueMeasure(emukit_rbf, integral_bounds=np.array([[lb, ub] for rr in range(DIM)]))
   emukit_model = BaseGaussianProcessGPy(kern=emukit_qrbf, gpy_model=gpy_model)
   emukit_method = VanillaBayesianQuadrature(base_gp=emukit_model, X=X_init, Y=Y_init)
   ivr_acquisition = IntegralVarianceReduction(emukit_method)
   space = ParameterSpace(emukit_method.reasonable_box_bounds.convert_to_list_of_continuous_parameters())
   optimizer = GradientAcquisitionOptimizer(space)
   x_new,_ = optimizer.optimize(ivr_acquisition)

   # unfortunatly, this fast computation does not seem available for \int P(x) f(x) dx
   #initial_integral_mean, initial_integral_variance = emukit_method.integrate()

   # optional plotting routine
   if PLOT:
      NX = 25
      x = y = np.linspace(lb, ub, NX)
      X, Y = np.meshgrid(x, y)
      x_plot = np.array([X, Y]).reshape(2, -1).T
      Z, unc = gpy_model.predict(x_plot, prob=False, full_cov=True)
      Z = Z.reshape(NX, NX)
      S = np.diagonal(unc).reshape(NX, NX)
      A = ivr_acquisition.evaluate(x_plot).reshape(NX, NX)
      fig, ax = plt.subplots(1, 3, figsize=(6, 3))
      ax[0].contourf(X, Y, Z, 1000)
      ax[1].contourf(X, Y, S, 1000)
      ax[2].contourf(X, Y, A, 1000)

      for aa in range(2): 
        ax[aa].scatter(X_init[:, 0], X_init[:, 1], c='pink')
        ax[aa].set_xlim(lb - .1, ub + .1)
        ax[aa].set_ylim(lb - .1, ub + .1)

      plt.savefig('figs/%i' % ii)
      plt.clf()

   # add new point to x and f(x) samples
   X_init = np.append(X_init, x_new, 0)
   Y_init = np.append(Y_init, f(x_new), 0)

# estimate integral by MC simulation of GP model
estimate = np.mean(gpy_model.predict(np.random.normal(1, 1, (100000, DIM)), prob=False)[0])

print('BQ estimate ', estimate)

truth = np.mean(f(np.random.normal(1, 1, (100000, DIM))))
print('truth ', truth)

$~$

   BQ estimate  4.168972307688939
   truth  3.9995291930258525

Best Answer

This type of problem can often be addressed by a transformation of variables.

For this you need to find an injective almost everywhere differentiable map $\Phi:\mathbb{R}^d\rightarrow ]0,1[^d$ with $\mid \text{ det}D\Phi(x)\mid = P(x).$ Using the substitution rule as stated in the Wikipedia article, you can conclude:

$$ \int_{\mathbb{R}^d}f(x)P(x)dx = \int_{]0;1[^d}f(\Phi^{-1}(u))du.$$

Now you can apply whatever emukit methods you want to apply to your new function $f(\Phi^{-1}(u)).$ The issue with the sampling density, which you correctly identified when using $f(x)P(x)$ as target, does disappear. You now sample uniform "quantiles" which are pushed forward to the right location by $\Phi^{-1}.$

The caveat is of course that you need to find this function. Its mathematical existence is ensured by general theorems (see e.g. this answer) but the practical calculation may be tricky, of course.

But for the case of independent Gaussians you just need to apply the Gaussian CDF to the translated margins. The correlated case can be handled using a square root of the correlation matrix.

Note: $\Phi$ is NOT the CDF of the density $P$. The CDF maps from $\mathbb{R}^d$ to $]0;1[$ but this function maps to $]0;1[^d.$

Related Question