Solved – What are the branches of statistics

classificationself-study

In mathematics, there are branches such as algebra, analysis, topology, etc. In machine learning there is supervised, unsupervised, and reinforcement learning. Within each of these branches, there are finer branches that further divide the methods.

I am having trouble drawing a parallel with statistics. What would be the main branches of statistics (and sub-branches)? A perfect partition is likely not possible, but anything is better than a big blank map.

Visual examples: enter image description here enter image description here

Best Answer

You could look into the keywords/tags of the Cross Validated website.


Branches as a network

One way to do this is to plot it as a network based on the relationships between the keywords (how often they coincide in the same post).

When you use this sql-script to get the data of the site from (data.stackexchange.com/stats/query/edit/1122036)

select Tags from Posts where PostTypeId = 1 and Score >2

Then you obtain a list of keywords for all questions with a score of 2 or higher.

You could explore that list by plotting something like the following:

relations between tags

Update: the same with color (based on eigenvectors of the relation matrix) and without the self-study tag

relations between tags

You could clean this graph up a bit further (e.g. take out the tags which do not relate to statistical concepts like software tags, in the above graph this is already done for the 'r' tag) and improve the visual representation, but I guess that this image above already shows a nice starting point.

R-code:

#the sql-script saved like an sql file
network <- read.csv("~/../Desktop/network.csv", stringsAsFactors = 0)
#it looks like this:
> network[1][1:5,]
 [1] "<r><biostatistics><bioinformatics>"                                 
 [2] "<hypothesis-testing><nonlinear-regression><regression-coefficients>"
 [3] "<aic>"                                                              
 [4] "<regression><nonparametric><kernel-smoothing>"                      
 [5] "<r><regression><experiment-design><simulation><random-generation>"  

l <- length(network[,1])
nk <- 1
keywords <- c("<r>")
M <- matrix(0,1)

for (j in 1:l) {                              # loop all lines in the text file
  s <- stringr::str_match_all(network[j,],"<.*?>")           # extract keywords
  m <- c(0)                                             
  for (is in s[[1]]) {
    if (sum(keywords == is) == 0) {           # check if there is a new keyword
      keywords <- c(keywords,is)              # add to the keywords table
      nk<-nk+1
      M <- cbind(M,rep(0,nk-1))               # expand the relation matrix with zero's
      M <- rbind(M,rep(0,nk))
    }
    m <- c(m, which(keywords == is))
    lm <- length(m)
    if (lm>2) {                               # for keywords >2 add +1 to the relations
      for (mi in m[-c(1,lm)]) {
        M[mi,m[lm]] <- M[mi,m[lm]]+1
        M[m[lm],mi] <- M[m[lm],mi]+1
      }
    }
  }
}


#getting rid of <  >
skeywords <- sub(c("<"),"",keywords)
skeywords <- sub(c(">"),"",skeywords) 


# plotting connections 

library(igraph)
library("visNetwork")

# reduces nodes and edges
Ms<-M[-1,-1]             # -1,-1 elliminates the 'r' tag which offsets the graph
Ms[which(Ms<50)] <- 0
ww <- colSums(Ms)
el <- which(ww==0)

# convert to data object for VisNetwork function
g <- graph.adjacency(Ms[-el,-el], weighted=TRUE, mode = "undirected")
data <- toVisNetworkData(g)

# adjust some plotting parameters some 
data$nodes['label'] <- skeywords[-1][-el]
data$nodes['title'] <- skeywords[-1][-el]
data$nodes['value'] <- colSums(Ms)[-el]
data$edges['width'] <- sqrt(data$edges['weight'])*1
data$nodes['font.size'] <- 20+log(ww[-el])*6
data$edges['color'] <- "#eeeeff"

#plot
visNetwork(nodes = data$nodes, edges = data$edges) %>%
visPhysics(solver = "forceAtlas2Based", stabilization = TRUE,
           forceAtlas2Based = list(nodeDistance=70, springConstant = 0.04,
                                   springLength = 50,
                                   avoidOverlap =1)
           )

Hierarchical branches

I believe that these type of network graphs above relate to some of the criticisms regarding a purely branched hierarchical structure. If you like, I guess that you could perform a hierarchical-clustering to force it into a hierarchical structure.

Below is an example of such hierarchical model. One would still need to find proper group names for the various clusters (but, I do not think that this hierarchical clustering is the good direction, so I leave it open).

hierarchical clustering

The distance measure for the clustering has been found by trial and error (making adjustments until the clusters appear nice.

#####
#####  cluster

library(cluster)

Ms<-M[-1,-1]
Ms[which(Ms<50)] <- 0
ww <- colSums(Ms)
el <- which(ww==0)

Ms<-M[-1,-1]
R <- (keycount[-1]^-1) %*% t(keycount[-1]^-1)
Ms <- log(Ms*R+0.00000001)

Mc <- Ms[-el,-el]
colnames(Mc) <- skeywords[-1][-el]

cmod <- agnes(-Mc, diss = TRUE)

plot(as.hclust(cmod), cex = 0.65, hang=-1, xlab = "", ylab ="")