“pseudo-coordinates”

coordinate systemsgeometrymachine learningmanifolds

I'm new to geometry and when I was reading some research paper about geometric deep learning, there was a word "pseudo-coordinates". I searched the means of it, but there was few references. Can someone please explain me what it is and how it is related to manifolds? Thank you in advance.

Best Answer

Pseudo-coordinates in geometric learning architectures serve two purposes:

  1. They provide local pairwise features among neighbours, i.e. they associate some latent vector to the edges of the graph, rather than just the nodes. They are thus like an adjacency matrix but describing something richer than just connectivity.

  2. They act like a local coordinate system describing a local "patch" on the manifold surface or graph. This tells the network something about directionality on the graph.

Essentially, they give the network easy access to the local geometry or structure of the patches, rather than forcing it to figure it out from e.g. just binary connectivity.

Mathematically, consider a graph-like construct $\mathcal{M}=(\mathcal{V},\mathcal{E},\mathcal{U})$, where $\mathcal{V}$ is the set of nodes (with features $f(v)\in\mathbb{R}^n\;\forall\;v\in\mathcal{V}$), $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ is the set of directed edges, and $\mathcal{U}$ is the pseudo-coordinate function. Let $\mathcal{N}(v)=\{u\in\mathcal{V}\mid (u,v)\in\mathcal{E}\}$ be the set of neighbours of a node $v$. We can think of $\mathcal{U}$ in two equivalent ways: (1) as a function $u(x,y) : \mathcal{V}\times \mathcal{N}(x)\rightarrow\mathbb{R}^d$ that maps a vertex and any of its neighbours to a vector and (2) as a set associating a vector to every directed edge $\mathcal{U}=\{ u(e)\in\mathbb{R}^d \mid e\in\mathcal{E}\}$.

If you prefer to think of a smooth Riemannian manifold $M=(\mathcal{X},g)$, then one example is to consider a local chart $C(p)$ around some $p\in M$ with local coordinates $\alpha_p,\beta_p$ (in the 2D case). One simple pseudo-coordinates would be just $u(p,q)=(\alpha_p(q),\beta_p(q))$. This is the basis of the Geodesic CNN, referenced in the paper you linked. But they can be more general than this (e.g. a transform thereof). (See SplineCNN, for instance, or the graph example in the paper you linked).

How they are used depends on the paper. For example, in most graph (convolutional) neural networks, one wants to compute a weighted average of the features of a point and those of its connected neighbours. But how to compute the weight? If all you know is that the nodes are connected, then the weights are limited in what they can be computed from. But now the weights in the average can depend on the pseudo-coordinates, for instance: $$ F(v)_j = \sum_{\xi\in\mathcal{N}(v)} W(u(\xi,v)|\Theta_j) f_j(\xi) $$ where we are computing the $j$th output (indexing over the channels of the weighting kernel and those of the input feature map) of node $v\in\mathcal{V}$, dependent on learned parameters $\Theta_j$ of weight function $W:\mathbb{R}^d\rightarrow \mathbb{R}$. This pseudo-coordinate-dependent weighted sum is called a patch operator, since it extracts a representation $F(v)$ of a patch about a point $v$. The analogy to this in classical convolutional neural networks is simply the Euclidean image patch around a given point, which is convolved with a kernel to give rise to the new feature map at that point. Thus, given the (pseudo-)patch $F(v)$, the natural thing to do is "convolve" it to a learned graph signal $g$ (analogous to the learned kernel weights of classical CNN's filters): $$ (f\ast g_\ell)(v) = \sum_j g_{\ell j} F(v)_j $$ So that the output features are $ f_\text{out}(v)=((f\ast g_1)(v),\ldots,(f\ast g_K)(v)) $. Again, though, it depends on the paper.

Basically, relating this back to classical CNNs, in the case of Euclidean images, we extract little windows as patches $P$, treating each element of this window equally. The learned kernel $\kappa$ convolved to it easily associates each weight to an input value: $P_{ij}$ gets multiplied to $\kappa_{ij}$, before performing the summation part of the convolution. But on manifolds or graphs, this association is no longer so obvious. For instance, imagine rotating an image: the CNN weights would then not properly apply to the input, because the positions of the template filter would have gone astray. Instead, on manifolds, we create some pseudo-coordinates instead, which attempt to help let the network learn a solution to this directional ambiguity problem, though it does not solve it in general.

References

Related Question