I am trying to understand the practical implementation of the Gaussian process regression problem and some questions concerning a research paper.
A research paper, "Multi-View Stereo by Temporal Nonparametric Fusion" tries to solve the depth estimation of the input frames by temporally fusing the information from the previous frame by taking the output from the previous latent space encoding that is subjected to the Gaussian process.
Link to paper and code: https://aaltoml.github.io/GP-MVS/
Initially distance between the poses are calculated with the help of pose distance measure $D[P_i, P_j] = \sqrt(||t_i – t_j||^2+2/3(tr(I – R_i^TR_j))$. Poses are defined with translation vector $t$ and rotation vector $R$.
The distance D is used to construct the covariance funtion.
$k[P, P'] = \gamma^2(1+\frac{\sqrt(3)D(P,P')}{l})exp(\frac{-\sqrt{3}D[P,P']}{l})$
In order to share the temporal information between the frames indpendent GP priors to all values in $z_i$
$z_j(t) = GP(0,k(P[t],P[t']))$
$y_{j,i} = z_j(t_i)+\epsilon_{j,i}, \epsilon_{j,i} = N(0, \sigma^2) $
The model is trained in a batch of 4 in batch approach.
code
class GPlayer(nn.Module):
def __init__(self):
super(GPlayer, self).__init__()
self.gamma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()
self.ell = nn.Parameter(torch.randn(1), requires_grad=True).float()
self.sigma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()
def forward(self, D, Y):
'''
:param D: Distance matrix
:param Y: Stacked outputs from encoder
:return: Z: transformed latent space
'''
b,l,c,h,w = Y.size()
Y = Y.view(b,l,-1).cpu().float()
D = D.float()
K = torch.exp(self.gamma2) * (1 + math.sqrt(3) * D / torch.exp(self.ell)) * torch.exp(-math.sqrt(3) * D / torch.exp(self.ell))
I = torch.eye(l).expand(b, l, l).float()
X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I)
Z = K.bmm(X)
Z = F.relu(Z)
return Z
My question is
-
As stated in the paper (figure 2) to calculate the output $z_i$ it takes in a. output of encoder b. previous latent space encoding c. camera pose.
So my question is, by looking at the above implementation code, I can clearly understand the model takes in $D$ and $Y$, but how the information from previous encoding output $z_i$ is passed onto to calculate $z_{i+1}$. What I understood is that the information is passed onto the next frame encoder by taking the learned information stored in $gamma2, ell, sigma2$. Is my understanding is right? -
Framing of gaussian process as linear regression problem. In the implementation code the regression problem is framed as solving AX=B.
i.e here A = Y (stacked output from encoder) and B = K+torch.exp(self.sigma2)*I (kernel+variance). So, $X = BA^{-1}$. I am facing difficulty in understand how the Gaussian process regression is framed as solving AX = B. Also why did the author multiplied $K$ with $X$ i.e $K.bmm(X)$
Best Answer
The posterior mean and variance are defined as per the definition from Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
Now the definition of mean and variance can be calculated from the below equation
In the case mentioned in the research paper, the new test point is the input point itself, just that it is subjected to Gaussian regression. So K star transpose is nothing but the Kernel with respect to the input point itself. i.e $k_*^T = k(P,P') = C$
Now $ (C+\sigma^2I)^{-1} = (K+\sigma_n^2I)^{-1}$
$y = Y$
We need to calculate $(C+\sigma^2I)^{-1}Y$, which can be framed as solving AX=B, i.e $B = Y, A = (C+\sigma^2I)$ So, X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I) will do the trick. Once we get X, we can multiply with C, as per the formula, and get the mean which is nothing but Z.