[Tex/LaTex] Literate R Programming

literate-programmingr

I'd like to embed some R into a LaTeX document. After a bit of googling I found that Sweave and knitr allow you to do this. I compiled a simple example with Sweave and it works.

I'd be much obliged if somebody could answer the following questions:

  • Are there other approaches?
  • What are the merits of the different approaches?

Please note that I'd like to run LaTeX from the command line. I am not interested in IDE solutions.

TIA for your help.

Best Answer

0. tl;dr

knitr is preferable to Sweave, and ezknitr is a wrapper around knitr that is probably worth using—especially if you are only building documents from the command line (but this limits you to R Markdown; see below); I don't think there are IDEs that have integrated ezknitr use (at least not at the time of writing)—because it makes it easier to ensure the directories and paths are all correct.

knitr/ezknitr (henceforth just knitr) may or may not be preferable to Thruston's suggested approach approach, depending on your use-case.

What follows is some justification for these points, coupled with examples.

1. knitr vs. Sweave

knitr is preferrable to Sweave for a variety of reasons. Two main reasons to prefer knitr to Sweave are (i) you get better integration with tikzDevice in knitr, and (ii) chunk options are more versatile.

1.1. knitr and tikzDevice

I should mention the caveat that I've never really used Sweave, but my understanding from reading blog posts on the internet is that it is much more straightforward to use tikzDevice with knitr than it is with Sweave.

Two reasons you might want to use tikzDevice with your graphs are because (i) you get better typesetting in labels and titles (especially of math), and (ii) you get a consistent font between the text in your document and the text in your graphs inside of your document. Here's an MWE showing both of these things.1

\documentclass{article}

\usepackage{tikz}
\usetikzlibrary{decorations.pathreplacing}

\tikzstyle{underbrace style}=[decorate,decoration={brace,raise=5mm,amplitude=3pt,mirror},color=gray]
\tikzstyle{underbrace text style}=[font=\scriptsize, below, pos=.5, yshift=-8mm]

\newcommand*{\MyContrivedTitle}{%
  \begin{tikzpicture}
    \node (MyTitle) {Average miles per gallon by gear (and some math for fun: $\int_{a}^{b} x^2 dx$)};
    \draw [underbrace style] (MyTitle.north west) -- (MyTitle.north east) node [underbrace text style] {My contrived title with \texttt{tikz}};
  \end{tikzpicture}}

\begin{document}

\section{Introduction}

<<setup, include=FALSE, cache=FALSE>>=
### Set the global chunk options
### See http://yihui.name/knitr/options/#chunk_options
library(knitr)
opts_chunk$set(cache=FALSE,
               echo=FALSE,
               message=FALSE,
               warning=FALSE,
               highlight=FALSE,
               sanitize=FALSE,
               tidy=TRUE,
               dev='tikz',
               fig.env='figure',
               fig.show='hold',
               fig.lp='fig:',
               fig.align='center',
               fig.pos='htbp',
               out.width='.75\\textwidth'
               )
@

As can be seen in Figure \ref{fig:car-plot}, \ldots

<<car-plot,fig.cap='A graphic produced by \\texttt{knitr} and \\texttt{tikzDevice}'>>=
library(dplyr) # a good package for data manipulation
library(ggplot2) # a good package for graphing
data <- mtcars %>%
    group_by(gear) %>%
    summarise(SD = sd(mpg),
              SE= (SD/sqrt(length(mpg))),
              MEAN = mean(mpg)
              )
carplot <- ggplot(data,
                   aes(x = factor(gear),
                       y = MEAN
                       )
                   ) +
    geom_bar(stat = "identity") + 
    geom_errorbar(aes(ymin = MEAN - SE,
                      ymax = MEAN + SE
                      ),
                  width = 0.25,
                  size = 0.5
                  ) +
    ggtitle("\\MyContrivedTitle") +
    xlab("Gear") +
    ylab("Mean MPG") +
    theme(plot.margin=unit(c(1,0,0,0),"cm"))
carplot
@

\end{document} 

This produces the following output:

example output of the above MWE demonstrating the better typesetting with knitr and tikzDevice

1.2. More versatile chunk options in knitr (compared to Sweave)

This example is taken directly from Yihui. In knitr (but not Sweave), it is possible to delay the evaluation of certain chunk options, so that you could, for example, include the p-value of a t-test in a caption.

\documentclass{article}

\begin{document}

\section{Introduction}

<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_knit$set(eval.after = 'fig.cap') # evaluate fig.cap after the chunk
opts_chunk$set(cache=FALSE,
               echo=FALSE,
               message=FALSE,
               warning=FALSE,
               highlight=FALSE,
               sanitize=FALSE,
               tidy=TRUE,
               dev='tikz',
               fig.env='figure',
               fig.show='hold',
               fig.lp='fig:',
               fig.align='center',
               fig.pos='htbp',
               out.width='.75\\textwidth'
               )
@


<<t-test, fig.cap=paste("The P-value is", t.test(x)$p.value)>>=
x = rnorm(100)
boxplot(x)
@

\end{document}

The output of this is:

output of the example MWE demonstrating delayed chunk option evaluation

2. knitr vs. Thruston's suggested approach

If you prefer to keep your R code and your LaTeX code separate, Thruston's suggested approach is not necessarily preferable, because it is possible to use external R code in a LaTeX document with knitr. That being said, there are some advantages and disadvantages of the two different approaches that are worth mentioning.

Some advantages of knitr over Thruston's suggested approach are:

  • You have a literately programmed document and thus reproducible research/workflow/whatever.
  • There's very little room for human error (except in writing your R code, of course).
  • It's easier to get consistent fonts across the document and figures (though it's not impossible to do this if you use Thruston's suggested approach and have your R code output a PDF with the font that you want to use embedded in the PDF).

Some advantages of Thruston's suggested approach over knitr are:

  • Your R code is not evaluated each time you compile your document, so compilation time will be faster, potentially a lot faster if you're drawing a lot of graphics or doing heavy calculations in R (though this can be mitigated to some extent with caching).
  • Your source code could potentially be more human-readable (but this introduces more room for human error). For example, the following code block is arguably less human readable than the subsequent code block:

Using knitr to make a document more reproducible (but trading off in readability):

\begin{tabular}{lcc}
                   & Adults                                                                       & Children \\
Active sentences   & \Sexpr{data[data$GROUP == "Adults" & data$CONDITION == "Active",]$ACCURACY}  & \Sexpr{data[data$GROUP == "Children" & data$CONDITION == "Active",]$ACCURACY} \\
Passive sentences  & \Sexpr{data[data$GROUP == "Adults" & data$CONDITION == "Passive",]$ACCURACY} & \Sexpr{data[data$GROUP == "Children" & data$CONDITION == "Passive",]$ACCURACY} \\
\end{tabular}

Not using knitr but copying and pasting the values from the output of an R script (thus arguably being more human-readable but introducing more possibility for human error):

\begin{tabular}{lcc}
                   & Adults  & Children \\
Active sentences   & 98      & 93 \\
Passive sentences  & 94      & 67 \\
\end{tabular}

3. ezknitr vs. knitr


UPDATE: It seems that ezknitr does not currently process .Rnw files. Hopefully this is a feature that will be added in the future (see here; also see here).


I have yet to try out ezknitr myself, so I'll have to update this answer once I have a chance to do so, but the blog post that introduces ezknitr suggests that it addresses problems with paths and working directories that can sometimes be frustrating. To quote from the blog post:

One common source of frustration with knitr is that it assumes the directory where the source file lives should be the working directory, which is often not true. ezknitr addresses this problem by giving you complete control over where all the inputs and outputs are, and adds several other convenient features. The two main functions are ezknit() and ezspin(), which are wrappers around knitr's knit() and spin(), used to make rendering markdown/HTML documents easier.

This is presumably useful, especially if you are building documents from the command line for a project with files in many different directories.

4. Compiling (from the command line)

For posterity: RStudio—for the most part—is a good IDE for use with knitr and LaTeX (things get dicey as soon as you have a bibliography involved).

You said you were more interested in compiling documents from the command line. When you use knitr, you edit a .Rnw file and then you process it with knitr's knit() function, which outputs a .tex file. You never want to edit that .tex file directly. All changes should be made to the .Rnw file, and then you should regenerate the .tex file using knit().

Thus, you could build your document from the command line by doing something like this:

Rscript -e "library(knitr); knit('my_file.Rnw')" # this command produces my_file.tex
pdflatex my_file.tex                             # this command produces my_file.pdf

You could also easily write some sort of batch/make/bash script to do this.2


Notes

  1. It seems that there is a problem when setting the dev chunk option to tikz in knitr and loading fontspec, so it's not possible to use an arbitrary font with XeLaTeX or LuaLaTeX, unfortunately. Hopefully this is an issue that will be fixed soon.
  2. There is currently a problem in using arara to build .Rnw documents from the command line, but in the upcoming version of arara, Paulo has promised an out-of-the-box and batteries-included arara rule that works with knitr, so it should be possible to use arara to build .Rnw documents in the (near) future.