[Tex/LaTex] Dynamically format labels/columns of a LaTeX table generated in R/knitr/xtable

arraysknitrmacrostablesxtable

I have some tables coming from the statistical package R that I want to export in LaTeX markup.

I would like to make every column large as the longest word of its label (assuming for simplicity the related tabular values are shorter).

If the labels are like such:

Short label
Very long description
Three letter label

then the desired output (including some example values) is:

Short   Very   long     Three
label   description     letter
                        label
-----------------------------
1       5               9
2       6               10
3       7               11
4       8               12
-----------------------------

One can do every sort of trickery on table label/values in R, but outside LaTeX, R itself can only count characters of table labels, without knowing the actual space used when they are typeset by the LaTeX engine.

Main idea

The main the idea is to generate, from an R script, the LaTeX code able to measure the words and use the lengths calculated in LaTeX to setup the actual tabular environment. A scratch of the LaTeX measuring document is:

...
\usepackage{varwidth, calc}     
...

\newlength{\temp}
%% Column 1     
\setlength{\temp}{\widthof{\fbox{\begin{varwidth}{\textwidth}
Short\\ label
\end{varwidth}}}}
\the\temp

%% Column 2
\setlength{\temp}{\widthof{\fbox{\begin{varwidth}{\textwidth}
Very \\ long \\ description
\end{varwidth}}}}
\the\temp

%% Column 3
\setlength{\temp}{\widthof{\fbox{\begin{varwidth}{\textwidth}
Three \\ letter \\ label
\end{varwidth}}}}
\the\temp

Latexing it, I get this output:

30.71672pt
54.6612pt
32.38335pt

Now I could parse this output in R and produce the following LaTeX code:

\usepackage{array}
...
\def\tabcol{p{30.71672pt}p{54.6612pt}p{32.38335pt}}
\expandafter\tabular\expandafter{\tabcol}
\hline
Short label & Very long description & Three letter label \\\hline
1 & 2 &  3
\endtabular

Desired solution

Instead of parsing the output of the first code snippet in R, I wonder if I can reduce this step and do it straight in LaTeX, that is: can I capture the lengths \temp and store them in a sort of "string variable", so that I can use it in the \tabcol macro?

EDIT

A number of comments are asking why I am not using knitr/Sweave or the like. I didn't enter into these details to avoid being flagged as off topic. Now I suppose more insights are needed by readers.

First I use Rnw+knitr+xtable. R-NoWeb documents are LaTeX docs interspersed with the special markup blocks, denoted "code chunks":

<<...>>
 ... 
@

where the dots represent R code or options. knitr executes the R code inside the code chunks and replaces it the its generated output formatted in LaTeX (if the code produces a figure, knitr will automatically place an \includegraphics... pointing to the figure file, unless you opt for a manual control of the LaTeX output). As the knitr parsed file is an "ordinary" LaTeX file, you latex it and your report is done.

As for the tables, the module xtable can convert R data objects in LaTeX, using several cool LaTeX environments (booktabs, tabularx etc.). To the best of my knowledge no one can automatically resize table as I am trying to do. tabularx does it, but with a different algorithm.

Anyway in R it is easy to arrange table data by column and generate for each column a LaTeX output similar to:

%% Column 1
\setlength{\temp}{\widthof{\fbox{\begin{varwidth}{\textwidth}
Short\\ label
\end{varwidth}}}}
\the\temp

As observed above, by compiling these lines (duly repeated for each column) I get the desired sizes. To capture temp length I might use:

\immediate\openout\tempfile=lengths.tex
\immediate\write\tempfile{\the\temp}
\immediate\closeout\tempfile

So the actual R-generated file, say table.tex, should be similar to:

\usepackage{varwidth, calc}
\usepackage{array}
 ...
\newlength{\temp}
\immediate\openout\tempfile=lengths.tex     
...

%% Column 1
\setlength{\temp}{\widthof{\fbox{\begin{varwidth}{\textwidth}
Short\\ label
\end{varwidth}}}}
%\the\temp
\immediate\write\tempfile{\the\temp}

%% Column 2
...

%% Column n
...

\immediate\closeout\tempfile
...

After compiling in LaTeX table.tex, lengths.tex is produced. lengths.tex can be reparsed in R in order to read column values inside. A new table.tex can be produced

\usepackage{array}
 ...

\def\tabcol{p{...pt}p{...pt}...}
\expand\tabular\expandafter{\tabcol}

\endtabular

p{...pt}'s are those obtained from lengths.tex.

The new table.tex can be relatexed to get the final result, but, as told above, I would like to parse the first table.tex (or the first pass output lengths.tex) straight in LaTeX and automatically (without going back and forth from R to LaTeX).

Best Answer

I think I found an acceptable solution.

Assume that the statistical table is schematically as follows:

  Short label Very long description Three letter label
1           1                     5                  9
2           2                     6                 10
3           3                     7                 11
4           4                     8                 12

LaTeX code to be generated to make every column large as the longest word of its label is:

\documentclass{article}
\usepackage{varwidth, calc}
\begin{document}

%% LaTeX vars
\def\colDef{} 
\newlength{\temp}

%% Column 1 
\setlength{\temp}{\widthof{\mbox{\begin{varwidth}{\textwidth}
Short\\label
\end{varwidth}}}}
\edef\colDef{\colDef p{\the\temp}}

%% Column 2 
\setlength{\temp}{\widthof{\mbox{\begin{varwidth}{\textwidth}
Very\\long\\description
\end{varwidth}}}}
\edef\colDef{\colDef p{\the\temp}}

%% Column 3 
\setlength{\temp}{\widthof{\mbox{\begin{varwidth}{\textwidth}
Three\\letter\\label
\end{varwidth}}}}
\edef\colDef{\colDef p{\the\temp}}

\edef\colDef{{\colDef}}

%% Table Macro
\newcommand{\maketab}[2]{%
\begin{tabular}{#1}
                #2
\end{tabular}
}

%% Table Body
\newcommand{\tableBody}{
Short label & Very long description & Three letter label \\ 
  \hline
  1 &   5 &   9 \\ 
    2 &   6 &  10 \\ 
    3 &   7 &  11 \\ 
    4 &   8 &  12 \\ 
   \hline
}

%% and... 
\expandafter\maketab\colDef{\tableBody}

\end{document}

Latexing one gets the intended results for labels:

Short   Very   long     Three
label   description     letter
                        label
-----------------------------
1       5               9
2       6               10
3       7               11
4       8               12
-----------------------------

The R/knitr side

As there are both the knitr and xtable tags on tex.stackexchange, I assume the R code to generate the above LaTeX might be of interest to someone.

The Rnw document to "knit" is:

\documentclass{article}
\usepackage{varwidth, calc}     
\begin{document}

<<label=ididit, results='asis', echo=FALSE>>=

library(xtable) 

## Generate sample 4x3 table, with long labels
## -------------------------------------------

tab=data.frame(matrix(1:12, ncol=3))
names(tab) = c("Short label", "Very long description", "Three letter label")

## Generate LaTeX code: every \ needs to be doubled (escaped) 
## -----------------------------------------------------------

## Print LaTeX vars
cat("
\\def\\colDef{} 
\\newlength{\\temp}
")

## Dynamic text
text1="\\setlength{\\temp}{\\widthof{\\mbox{\\begin{varwidth}{\\textwidth}"
      # Labels here separated by // 
text2="\\end{varwidth}}}}
\\edef\\colDef{\\colDef p{\\the\\temp}}
"

## Separate labels with \\\\
lab= strsplit(names(tab), " ")
lab= sapply(lab, function(x) paste(x, collapse='\\\\'))

## Replace dynamic text with labels and print it
text=paste(text1, lab, text2, sep="\n", collapse="\n")
cat(text)
cat("\n\\edef\\colDef{{\\colDef}}")

## Setup main table macro 
cat("
\\newcommand{\\maketab}[2]{%
\\begin{tabular}{#1}
                #2
\\end{tabular}
}
")

## Table body obtained via xtable module
cat("\\newcommand{\\tableBody}{\n")
print.xtable(xtable(tab), only.contents=T, include.rownames=F)
cat("}\n")

###  eventually... 
cat("\\expandafter\\maketab\\colDef{\\tableBody}")

@ 

\end{document}

If this document is named table.rnw, knitting it, that is, executing in R:

knitr("table.rnw")

will generate a table.tex like the LaTeX code shown in the first listing above (plus some some knitr macro embellishment for graphics) to be compiled in LaTeX in order to get the desired table output.

Comments on solution

Note that the proposed solution requires one step in R + one step in LaTeX, as the generated LaTeX contains both the code to measure lengths and to use them in the tabular environment. xtable is used only to produce the formatted list of the table inner cells, so one can easily customise the tabular environment with something fancier and/or anyway set more formatting parameters for the table.

Avoiding extra round trip from R to LaTeX means speed, anyway do suggest any path to further improve it.