[Tex/LaTex] How to produce John Kruschke’s Bayesian model diagrams using TikZ or similar tools

diagramstikz-pgf

In a May 2012 blogpost John Kruschke discusses how he prefers his diagram conventions for representing bayesian models to the traditional approach.

You can see an example of his approach in the diagram below:

John Kruschke Model Diagram

Several people commented that they would like to adopt the convention if there was some way of automating the creation of such plots.
I currently have minimal experience with the various TeX related drawing tools like TikZ.

1. Would TikZ be a suitable tool for producing such diagrams? Or would another tool be more suited?

Also, such models are often adapted over time. Thus, automating or semi-automating the drawing process would be desirable.

2. To what extent could TikZ or a similar tool facilitate the automation or reuse of drawing such plots?

Finally, I'd be interested in any existing examples of something similar.

3. Thus, are there any examples of TikZ code doing something similar? Or if anyone was willing to show template code for something similar, that would be most appreciated.

Best Answer

I think TikZ would be great for this, but you'll probably need to write a package for it. I experimented a little bit, and here is some basic functionality. (I used some code from Bell Curve/Gaussian Function/Normal Distribution in TikZ/PGF)

The code in the preamble defines a new command, \randomvar, which can be used inside a tikzpicture environment to define a random variable. In the main document code, you can see how this is used. One can specify the distribution, a variable name, etc. The code defines four random variables, which show up as TikZ nodes, and so drawing arrows from and to them is easy.

\documentclass{article}

\usepackage{tikz}
\usepackage{pgfplots}

% --- this here would go into a package

\tikzset{bayes/pdf/.style={blue!50!white}}

\pgfmathdeclarefunction{gauss}{2}{%
  \pgfmathparse{1/(#2*sqrt(2*pi))*exp(-((x-#1)^2)/(2*#2^2))}%
}

\pgfmathdeclarefunction{exponential}{1}{%
  \pgfmathparse{(#1) * exp(-(#1) * x)}%
}

\pgfkeys{/tikz/bayes/label/.initial={}}
\pgfkeys{/tikz/bayes/name/.initial={}}
\pgfkeys{/tikz/bayes/distribution/.initial={0}}
\pgfkeys{/tikz/bayes/distribution name/.initial={}}

\tikzstyle{bayes/node}=[]

\newcommand\randomvar[2][1]{%
  \begingroup
  \pgfkeys{/tikz/bayes/.cd, #1}%
  \pgfkeysgetvalue{/tikz/bayes/distribution}{\distribution}%
  \pgfkeysgetvalue{/tikz/bayes/distribution name}{\distname}%
  \pgfkeysgetvalue{/tikz/bayes/name}{\parname}%
  \node[bayes/node] (#2) {
       \tikz{
           \begin{axis}[width=4cm, height=3cm,   
             axis x line=none, 
             axis y line=none, clip=false]
             \addplot[blue!50!white, semithick, mark=none, 
                    domain=-2:2, samples=50, smooth] {\distribution};
             \addplot[black, yshift=-4pt]  coordinates { (-2, 0) (2, 0) };
             \node at (rel axis cs: 0.5, 0.5) {\parname};
             \node[anchor=south] at (rel axis cs: 0.5, 0) {\sffamily\tiny\distname};
          \end{axis}
       }
  };
  \endgroup
}



% --- this here would be code written by the user

\begin{document}

\begin{tikzpicture}[node distance=3cm and 2cm, >=stealth]

\randomvar[distribution={gauss(0,0.5)}, 
                name=$M_0$, 
                distribution name=normal]{M0}
\randomvar[distribution={gauss(0,0.5)}, 
                distribution name=normal, 
                name=$M_1$,
                node/.style={right of=M0}]{M1}

\node[below of=M1] (eqn) { $\beta_0 + \beta_1 \mathbf{x}_i$ };

\randomvar[distribution={exponential(3)}, 
                distribution name=exponential,
                name=$M_2$,
                node/.style={right of=eqn}]{M2}

\randomvar[distribution={gauss(0,0.5)}, 
                distribution name=normal, 
                node/.style={below of=eqn}]{M3}


\draw[->] (eqn) -- node [anchor=east] {$=$} (M3.center);          
\draw[->] (M0.south) -- node [anchor=east] {$\sim$} (eqn.north west);
\draw[->] (M1.south) -- node [anchor=east] {$\sim$} (eqn);
\draw[->] (M2.south) -- node [anchor=east] {$\sim$} (M3);

\end{tikzpicture}   

\end{document}

The output is:

enter image description here

This code could be a start for a package, but clearly a lot of functionality is missing*. For example, it should be possible to add parameters to the distributions (e.g. the tau of your normal distribution), and define anchors for these parameters to allow for drawing arrows to them (notice that the exact positioning of the anchors would have to depend on the distribution to look good). I think it is possible to add more anchors like .south west; so one could refer to the first parameter of node M3 as M3.parameter 1 or something. Then it would be possible to, say, draw an arrow from M1 to the parameter of M3 by writing \draw[->] (M1.south) -- (M3.parameter 1);

Another issue is drawing arrows to the parameters in equations (in the equation containing the beta's). I don't immediately see how to do that right now, but I'm no TikZ expert.

In conclusion, although it may take some work and expertise to develop this (as expected), I do think that a TikZ package would be able to automate a good deal of the work of drawing these diagrams.

*) I also don't know if I use the right coding conventions regarding e.g. pgfkeys -- comments welcomed.