Analogy between MPNNs and GCNs

graph-neural-network

I'm currently trying to understand graph neural networks and got stuck with comparing MPNN and GCN. In my understanding, MPNN is more of a "general framework", while GCN is a specific "instantiation" of that framework. Accordingly, a GCN can be seen as a type of MPNN.

Now, a MPNN always consists of a message function, an aggregation function and an update function.

  • The message function is a function from the feature vectors x_i and x_j, as well as the connecting edge e_ij to a new, potentially differently shaped feature vector m_i.
  • The aggregation function is a function from an arbitrary-sized set of such (latent) feature vectors to a single one (call it u_i).
  • The update function is a function from a node's own feature vector x_i and the aggregated "upstream" features u_i to the node's new state h_i.

Is that correct so far?

If yes, what would be each of these three functions specifically in the case of a GCN?

In addition, I've read that a GCN cannot handle variable graph structures, i.e. graphs with different number of nodes / edges. I assume this is because the GCN's propagation rule formula has the adjacency matrix in it, which is fixed-size? And then, why / how can a MPNN, instead, deal with different input graphs?

I would be super thankful if somebody could help me sort out these questions!

Best Answer

Yeah I think the problem is that the MPNN paper, introduce both the general framework as well as their own MPNN, which is a little bit cofusing. Generally speaking tho, you only have message function and a vertex update function. Lastly, you have a readout function that aggregates the information on the whole graph using the information on individual nodes.

In the paper the message function and update function of the GCN is rewritten as:

enter image description here

Lastly, I am not sure what you mean by variable graph structure. But a GCN can handle graphs of different sizes. You can easily see, by looking at the Matrixmulitplications, that the dimensions of adjacency matrix is not important for the multiplication to "work". The only important part, is that you use a good aggregation function in the Readout phase.

Related Question