Solved – Community detection and modularity

dendrogramigraphmodularitynetworks

I am reading the book "Network science" of Barabasi and in particular the chapter on community detection.

If I understand correctly, modularity is a goodness factor of partition calculated by a certain algorithm: the greater the value of modularity and better is the structure of the communities found.

I tried several algorithms in R on the same network:

library(igraph)
library(GGally)
library(ggplot2)
library(ggdendro)

# load dataset
net <- read.graph("./ds.gml", format = c("gml"))

# find degree
deg <- igraph::degree(net, mode = "all")

###### GIRVAN & NEWMANN (divisive) ##################################################
girvNew <- cluster_edge_betweenness(net, modularity = TRUE)

girvNew_sizesComm <- sizes(girvNew) 
girvNew_numComm <- length(girvNew_sizesComm)
girvNew_modularity <- modularity(girvNew)

print(girvNew_numComm)
print(girvNew_sizesComm)
print(girvNew_modularity)

girvNew_den <- as.dendrogram(girvNew)

###### GREEDY ################################################################
fastgreedy <- fastgreedy.community(net)

fastgreedy_sizesComm <- sizes(fastgreedy)
fastgreedy_numComm <- length(fastgreedy_sizesComm)
fastgreedy_modularity <- modularity(fastgreedy)

print(fastgreedy_numComm)
print(fastgreedy_sizesComm)
print(fastgreedy_modularity)

fastgreedy_den <- as.dendrogram(fastgreedy)

###### WALKTRAP ################################################################
walktrap <- cluster_walktrap(net)

walktrap_sizesComm <- sizes(walktrap) 
walktrap_numComm <- length(walktrap_sizesComm) 
walktrap_modularity <- modularity(walktrap)

print(walktrap_numComm)
print(walktrap_sizesComm)
print(walktrap_modularity)

walktrap_den <- as.dendrogram(walktrap)

###### LOUVAIN ################################################################
louvain <- cluster_louvain(graph = net, weights = NULL)

louvain_sizesComm <- sizes(louvain)
louvain_numComm <- length(louvain_sizesComm)
louvain_modularity <- modularity(louvain)

print(louvain_numComm)
print(louvain_sizesComm)
print(louvain_modularity)

The result I get are:

enter image description here

Looking at the table I would say that the algorithm of detection communties more efficient is Louvain, then Girvan-Newmann, Wolktrap and Greedy.
Is this statement right?

The book says that the first algorithm (to Girvan-Newmann) calculates all partitions and doesn't choose the best one.
To do this you need to select the best cut of the dendrogram based on modularity.

Having said that, what means the value of modularity found (0.5380681)? But above all, how do I find the modularity for each step of algorithm?

What I would like to get is something similar to this chart:

enter image description here

That is, a chart that allows me to see the various modular values obtained and can then choose the best one.
This shows Modularity for each cut of the dendrogram.

Best Answer

Here are some answers to the multiple questions in your post (which I would avoid in the future- better answers will come with clear, individual questions).

Which algorithm is more efficient?

Perhaps I am misunderstanding what you mean by "efficiency", but I assume you mean computational efficiency. You can't know that from looking the table you provided- you need to check the CPU time, e.g., using system.time.

library(igraph)

# Random graph
net <- sample_gnm(100, 300, directed = F, loops = F)

# CPU time for modularity algorithms
system.time(cluster_edge_betweenness(net, modularity = TRUE))
#   user  system elapsed 
#  0.160   0.001   0.159  
system.time(fastgreedy.community(net))
#   user  system elapsed 
#  0.002   0.000   0.004
system.time(cluster_walktrap(net))
#   user  system elapsed 
#  0.002   0.000   0.003 
system.time(cluster_louvain(graph = net, weights = NULL))
#   user  system elapsed 
#  0.001   0.000   0.001 

You can see that there are differences in the CPU time used by the four algorithms, and the Louvain method is the fastest.

What does the modularity value mean?

In general, you can think of modularity as the fraction of edges lying within modules minus the expected fraction of edges lying within modules if there is no modularity. But the single modularity value that is output from these algorithms is the value of Q for the optimal partition found by the algorithm. The higher this maximum modularity is, the better the partition is. Thus, based on the table you included, the Louvain method found the best partition.

To be clear, the igraph function you are using to find communities with the Girvan-Newman algorithm DOES return the optimal communities found using the algorithm- you don't need to select it yourself. Look at the ?help page for the cluster_edge_betweenness function you use, and see the description of the modularity argument:

Logical constant, whether to calculate the maximum modularity score, considering all possibly community structures along the edge-betweenness based edge removals.

So the modularity you get IS the maximum modularity, and the partition you can extract using the communities function will give you the maximum partition. You don't need to find it yourself.

How can you extract the modularity score for each step of the Girvan-Newman algorithm?

As I mentioned above, I don't think you actually need to do this for any reason. But if you really do want to do that (for reasons that aren't clear in your post), it is probably a better question for Stack Overflow because it is about coding.

Related Question