Gaussian Process Regression – Problem Solving Techniques Explained

I am trying to understand the practical implementation of the Gaussian process regression problem and some questions concerning a research paper.
A research paper, "Multi-View Stereo by Temporal Nonparametric Fusion" tries to solve the depth estimation of the input frames by temporally fusing the information from the previous frame by taking the output from the previous latent space encoding that is subjected to the Gaussian process.
Link to paper and code: https://aaltoml.github.io/GP-MVS/

Initially distance between the poses are calculated with the help of pose distance measure $D[P_i, P_j] = \sqrt(||t_i – t_j||^2+2/3(tr(I – R_i^TR_j))$. Poses are defined with translation vector $t$ and rotation vector $R$.

The distance D is used to construct the covariance funtion.
$k[P, P'] = \gamma^2(1+\frac{\sqrt(3)D(P,P')}{l})exp(\frac{-\sqrt{3}D[P,P']}{l})$

In order to share the temporal information between the frames indpendent GP priors to all values in $z_i$

$z_j(t) = GP(0,k(P[t],P[t']))$

$y_{j,i} = z_j(t_i)+\epsilon_{j,i}, \epsilon_{j,i} = N(0, \sigma^2) $

The model is trained in a batch of 4 in batch approach.

code

class GPlayer(nn.Module):


    def __init__(self):
        super(GPlayer, self).__init__()

        self.gamma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.ell = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.sigma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()


    def forward(self, D, Y):
        '''
        :param D: Distance matrix
        :param Y: Stacked outputs from encoder
        :return: Z: transformed latent space
        '''
        b,l,c,h,w = Y.size()
        Y = Y.view(b,l,-1).cpu().float()
        D = D.float()

        K = torch.exp(self.gamma2) * (1 + math.sqrt(3) * D / torch.exp(self.ell)) * torch.exp(-math.sqrt(3) * D / torch.exp(self.ell))
        I = torch.eye(l).expand(b, l, l).float()

        X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I)

        Z = K.bmm(X)

        Z = F.relu(Z)

        return Z

My question is

As stated in the paper (figure 2) to calculate the output $z_i$ it takes in a. output of encoder b. previous latent space encoding c. camera pose.
So my question is, by looking at the above implementation code, I can clearly understand the model takes in $D$ and $Y$, but how the information from previous encoding output $z_i$ is passed onto to calculate $z_{i+1}$. What I understood is that the information is passed onto the next frame encoder by taking the learned information stored in $gamma2, ell, sigma2$. Is my understanding is right?
Framing of gaussian process as linear regression problem. In the implementation code the regression problem is framed as solving AX=B.
i.e here A = Y (stacked output from encoder) and B = K+torch.exp(self.sigma2)*I (kernel+variance). So, $X = BA^{-1}$. I am facing difficulty in understand how the Gaussian process regression is framed as solving AX = B. Also why did the author multiplied $K$ with $X$ i.e $K.bmm(X)$

Best Answer

In order to calculate the output Z, we need to incorporate poses, encoder output, and previous Z value. Poses are incorporated in the kernel matrix and the previous Z value is defined by hyperparameter gamma2 and ell, which is learned and updated during the backward propagation by reducing the error. Finally, the encoder output is taken from the last layer of the encoder.
Now the mean of the Gaussian process regression is defined by the formula

The posterior mean and variance are defined as per the definition from Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

Now the definition of mean and variance can be calculated from the below equation

In the case mentioned in the research paper, the new test point is the input point itself, just that it is subjected to Gaussian regression. So K star transpose is nothing but the Kernel with respect to the input point itself. i.e $k_*^T = k(P,P') = C$

Now $ (C+\sigma^2I)^{-1} = (K+\sigma_n^2I)^{-1}$

$y = Y$

We need to calculate $(C+\sigma^2I)^{-1}Y$, which can be framed as solving AX=B, i.e $B = Y, A = (C+\sigma^2I)$ So, X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I) will do the trick. Once we get X, we can multiply with C, as per the formula, and get the mean which is nothing but Z.

Best Answer

Related Solutions

Solved – Gaussian process regression toy problem

Gaussian Process – Automatic Relevance Determination in Multi-Output Coregionalized Gaussian Process

Related Question