Gaussian Process Regression – Problem Solving Techniques Explained

gaussian process

I am trying to understand the practical implementation of the Gaussian process regression problem and some questions concerning a research paper.
A research paper, "Multi-View Stereo by Temporal Nonparametric Fusion" tries to solve the depth estimation of the input frames by temporally fusing the information from the previous frame by taking the output from the previous latent space encoding that is subjected to the Gaussian process.
Link to paper and code: https://aaltoml.github.io/GP-MVS/

Initially distance between the poses are calculated with the help of pose distance measure $D[P_i, P_j] = \sqrt(||t_i – t_j||^2+2/3(tr(I – R_i^TR_j))$. Poses are defined with translation vector $t$ and rotation vector $R$.

The distance D is used to construct the covariance funtion.
$k[P, P'] = \gamma^2(1+\frac{\sqrt(3)D(P,P')}{l})exp(\frac{-\sqrt{3}D[P,P']}{l})$

In order to share the temporal information between the frames indpendent GP priors to all values in $z_i$

$z_j(t) = GP(0,k(P[t],P[t']))$

$y_{j,i} = z_j(t_i)+\epsilon_{j,i}, \epsilon_{j,i} = N(0, \sigma^2) $

The model is trained in a batch of 4 in batch approach.

code

class GPlayer(nn.Module):


    def __init__(self):
        super(GPlayer, self).__init__()

        self.gamma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.ell = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.sigma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()


    def forward(self, D, Y):
        '''
        :param D: Distance matrix
        :param Y: Stacked outputs from encoder
        :return: Z: transformed latent space
        '''
        b,l,c,h,w = Y.size()
        Y = Y.view(b,l,-1).cpu().float()
        D = D.float()

        K = torch.exp(self.gamma2) * (1 + math.sqrt(3) * D / torch.exp(self.ell)) * torch.exp(-math.sqrt(3) * D / torch.exp(self.ell))
        I = torch.eye(l).expand(b, l, l).float()

        X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I)

        Z = K.bmm(X)

        Z = F.relu(Z)

        return Z

My question is

  1. As stated in the paper (figure 2) to calculate the output $z_i$ it takes in a. output of encoder b. previous latent space encoding c. camera pose.
    So my question is, by looking at the above implementation code, I can clearly understand the model takes in $D$ and $Y$, but how the information from previous encoding output $z_i$ is passed onto to calculate $z_{i+1}$. What I understood is that the information is passed onto the next frame encoder by taking the learned information stored in $gamma2, ell, sigma2$. Is my understanding is right?

  2. Framing of gaussian process as linear regression problem. In the implementation code the regression problem is framed as solving AX=B.
    i.e here A = Y (stacked output from encoder) and B = K+torch.exp(self.sigma2)*I (kernel+variance). So, $X = BA^{-1}$. I am facing difficulty in understand how the Gaussian process regression is framed as solving AX = B. Also why did the author multiplied $K$ with $X$ i.e $K.bmm(X)$

Best Answer

  1. In order to calculate the output Z, we need to incorporate poses, encoder output, and previous Z value. Poses are incorporated in the kernel matrix and the previous Z value is defined by hyperparameter gamma2 and ell, which is learned and updated during the backward propagation by reducing the error. Finally, the encoder output is taken from the last layer of the encoder.
  2. Now the mean of the Gaussian process regression is defined by the formula mean and variance

The posterior mean and variance are defined as per the definition from Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

Now the definition of mean and variance can be calculated from the below equation enter image description here

In the case mentioned in the research paper, the new test point is the input point itself, just that it is subjected to Gaussian regression. So K star transpose is nothing but the Kernel with respect to the input point itself. i.e $k_*^T = k(P,P') = C$

Now $ (C+\sigma^2I)^{-1} = (K+\sigma_n^2I)^{-1}$

$y = Y$

We need to calculate $(C+\sigma^2I)^{-1}Y$, which can be framed as solving AX=B, i.e $B = Y, A = (C+\sigma^2I)$ So, X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I) will do the trick. Once we get X, we can multiply with C, as per the formula, and get the mean which is nothing but Z.

Related Question