Solved – Community detection (graph clustering) of a weighted graph with node attribute (categorical or quantitative)

clusteringdata visualizationnetworks

Consider a weighted graph having node attributes. Say the nodes are birds and the attribute could be either categorical (gender) or quantitative (number of feathers), i.e. similarly to this example.

One answer of this question already explained how to do community detection/graph clustering on a weighted graph with igraph in R, which is great.

However, how would one proceed to do the same but taking into account node attributes (i.e. having nodes with similar attributes grouped together, in addition of taking into account their weighted edges), in the case when:

  • each node's attribute is categorical (e.g. bird's gender)
  • each node's attribute is quantitative (e.g. bird's number of feathers)
  • each node has several attributes (which could be categorical or quantitative)

Having an answer to any of this case with an available package (in python or R) would already be fantastic.

If it makes any difference, the kind of data i actually have is a network very close to a grid ("lattice") network. And I am trying to get non-overlapping communities in this grid. In terms of numbers, I have maybe 10000 nodes that I would like to partition into 100 to 1000 communities. Each node is connected to 4 adjacent node (grid/lattice pattern).

Best Answer

At high level I would compute a similarity coefficient for each connected node to adjust weights of the connection. Then run clustering algorithm with adjusted graph.

Probably the easiest would be to start with Jaccard similarity - for each connected node compute number of shared categorical attributes and divide by the number of total categorical attributes. For numeric attributes you'll need to define similarity yourself (e.g. if two birds has X and Y feathers, than the smaller |X - Y| is - the more similar birds are).

At the end you will have a vector of similarity attributes: \begin{bmatrix}j_1,&n_2,&n_3,&...&n_k\end{bmatrix}

Then you need to come up with weights for each similarity coefficient: \begin{bmatrix}w_1,&w_2,&w_3,&...&w_k\end{bmatrix}

And your final edge weight will be something like: $$ e_{new\;weight} = (w_1\ast j_1+w_2\ast n_2+...+w_k\ast n_k)\ast e_{old\;weight} $$

Related Question