[Tex/LaTex] Illustrating the random forest algorithm in TikZ

foresttikz-arrowstikz-pgftikz-trees

I'm trying to illustrate the workings of a random forest in TikZ by combining these two figures. The TikZ picture should have the different tree structures present in the second image while also showing the path through each tree of a single sample (red dots) as shown in the first.

  1. source
    enter image description here
  2. source
    enter image description here

This is what I have so far.

enter image description here

\documentclass{standalone}

\usepackage{forest}

\begin{document}
\begin{forest} for tree={l sep=3em, s sep=3em, anchor=center, inner sep=0.7em, fill=blue!50, circle, font=\Large\sffamily}
  [Training Data, draw, rectangle, rounded corners, orange, text=white
    [,red!70[[][]][,red!70[[][]][,red!70[,red!70][]]]]
    [,red!70[,red!70[[][]][,red!70]][[][[][]]]]
    [,red!70[[][]][,red!70[,red!70[][,red!70]][]]]
  ]
\end{forest}
\end{document}

I'm still struggling with:

  1. Drawing boxes around and numbering each tree (tree 1, tree 2, tree n) as in the second image.
  2. Getting the 3 dots between trees 2 and n.
  3. Drawing arrows along the path that a sample takes through each tree as in image 1.
  4. Combining the results of all trees at the bottom with the text "majority voting for classification/mean for regression"

Any help with that would be much appreciated!

Update

Thanks to user121799's awesome help, this is the finished TikZ image.

enter image description here

\documentclass[tikz]{standalone}

\usepackage{forest}
\usetikzlibrary{fit,positioning}

\tikzset{
  font=\Large\sffamily\bfseries,
  red arrow/.style={
    midway,red,sloped,fill, minimum height=3cm, single arrow, single arrow head extend=.5cm, single arrow head indent=.25cm,xscale=0.3,yscale=0.15,
    allow upside down
  },
  black arrow/.style 2 args={-stealth, shorten >=#1, shorten <=#2},
  black arrow/.default={1mm}{1mm},
  tree box/.style={draw, rounded corners, inner sep=1em},
  node box/.style={white, draw=black, text=black, rectangle, rounded corners},
}

\begin{document}
\begin{forest}
  for tree={l sep=3em, s sep=3em, anchor=center, inner sep=0.7em, fill=blue!50, circle, where level=2{no edge}{}}
  [
  Training Data, node box
  [sample and feature bagging, node box, alias=bagging, above=4em
  [,red!70,alias=a1[[,alias=a2][]][,red!70,edge label={node[above=1ex,red arrow]{}}[[][]][,red!70,edge label={node[above=1ex,red arrow]{}}[,red!70,edge label={node[below=1ex,red arrow]{}}][,alias=a3]]]]
  [,red!70,alias=b1[,red!70,edge label={node[below=1ex,red arrow]{}}[[,alias=b2][]][,red!70,edge label={node[above=1ex,red arrow]{}}]][[][[][,alias=b3]]]]
  [~~$\dots$~,scale=2,no edge,fill=none,yshift=-4em]
  [,red!70,alias=c1[[,alias=c2][]][,red!70,edge label={node[above=1ex,red arrow]{}}[,red!70,edge label={node[above=1ex,red arrow]{}}[,alias=c3][,red!70,edge label={node[above=1ex,red arrow]{}}]][,alias=c4]]]]
  ]
  \node[tree box, fit=(a1)(a2)(a3)](t1){};
  \node[tree box, fit=(b1)(b2)(b3)](t2){};
  \node[tree box, fit=(c1)(c2)(c3)(c4)](tn){};
  \node[below right=0.5em, inner sep=0pt] at (t1.north west) {Tree 1};
  \node[below right=0.5em, inner sep=0pt] at (t2.north west) {Tree 2};
  \node[below right=0.5em, inner sep=0pt] at (tn.north west) {Tree $n$};
  \path (t1.south west)--(tn.south east) node[midway,below=4em, node box] (mean) {mean in regression or majority vote in classification};
  \node[below=3em of mean, node box] (pred) {prediction};
  \draw[black arrow={5mm}{4mm}] (bagging) -- (t1.north);
  \draw[black arrow] (bagging) -- (t2.north);
  \draw[black arrow={5mm}{4mm}] (bagging) -- (tn.north);
  \draw[black arrow={5mm}{5mm}] (t1.south) -- (mean);
  \draw[black arrow] (t2.south) -- (mean);
  \draw[black arrow={5mm}{5mm}] (tn.south) -- (mean);
  \draw[black arrow] (mean) -- (pred);
\end{forest}
\end{document}

Best Answer

The following should do:

  1. Use fit.
  2. Add an appropriate node.
  3. See here.
  4. Add the nodes by hand.

\documentclass[tikz]{standalone}
\usetikzlibrary{fit,shapes.arrows,positioning}
\usepackage{forest}
\tikzset{marrow/.style={midway,red,sloped,fill, minimum height=3cm, single arrow, single arrow
    head extend=.5cm, single arrow head indent=.25cm,xscale=0.3,yscale=0.15,
    allow upside down}}
\begin{document}
\begin{forest} 
for tree={l sep=3em, s sep=3em, anchor=center, inner sep=0.7em, fill=blue!50,
circle, font=\Large\sffamily,where level=1{no edge}{}}
  [Training Data, draw, rectangle, rounded corners, orange, text=white,alias=TD
    [,red!70,alias=a1[[,alias=a2][]][,red!70,edge label={node[above=1ex,marrow]{}}[[][]][,red!70,edge label={node[above=1ex,marrow]{}}[,red!70,edge label={node[below=1ex,marrow]{}}][,alias=a3]]]]
    [,red!70,alias=b1[,red!70,edge label={node[below=1ex,marrow]{}}[[,alias=b2][]][,red!70,edge label={node[above=1ex,marrow]{}}]][[][[][,alias=b3]]]]
    [~$\cdots$~,scale=4,no edge,fill=none,yshift=-1em]
    [,red!70,alias=c1[[,alias=c2][]][,red!70,edge label={node[above=1ex,marrow]{}}[,red!70,edge label={node[above=1ex,marrow]{}}[,alias=c3][,red!70,edge label={node[above=1ex,marrow]{}}]][,alias=c4]]]
  ]
\node[draw,fit=(a1)(a2)(a3)](f1){};  
\node[draw,fit=(b1)(b2)(b3)](f2){};  
\node[draw,fit=(c1)(c2)(c3)(c4)](f3){};  
\path (f1.south west)--(f3.south east) node[midway,below=4em] (David) {mean};
\node[below=2em of David] (pred){prediction};
\foreach \X in {1,2,3}{\draw[-stealth] (TD) -- (f\X.north);
\draw[-stealth] (f\X.south) -- (David);}
\draw[-stealth] (David) -- (pred);
\end{forest}
\end{document}

enter image description here

Related Question