[Tex/LaTex] Automatically formatting a table from a file with a list of words

csvdatatooltables

I have a list of ~1000 words in a text file. The words are listed in a text file "my_words.txt" with no breaklines as follows:

word1, word2, word3, etc.

I would like to display these words in a figure/table with vertical alignment (and central alignment within each column), i.e.

word1 word2 word3 word4
word5 word6 word7 word7
word8

where either I specify the number of rows or columns, or LaTeX figures them out from the width that I specify for the table.

I thought the package datatool could help with this problem, but I can't get it to work. I tried:

\usepackage{datatool}
% ...

\begin{document}
% ...

\DTLsetseparator{,}
\DTLloaddb[noheader,keys={source,package,target,brokenpkg,impactset,brokensource}]{mytable}{full_concept_list.csv}


\begin{table}[htbp]
  \centering
  \DTLdisplaydb{mytable}
 \caption{CSV Table test}
\end{table}

but I get the error:

Missing $ inserted 
\DTLdisplaydb{mytable}

Why is datatool not working? Are there other tools more appropiate for this task?

Best Answer

Here's a way with expl3 and Heiko Oberdiek's catchfile:

\documentclass{article}
\usepackage[T1]{fontenc} % for the underscore
\usepackage{catchfile}
\usepackage{xparse}

\ExplSyntaxOn
\cs_set_eq:NN \egreg_catchfiledef:nnn \CatchFileDef
\cs_generate_variant:Nn \egreg_catchfiledef:nnn { c }
\cs_generate_variant:Nn \seq_set_split:Nnn { Nnv }

\NewDocumentCommand{\storefile}{ m m } % #1 = symbolic name, #2 = file name
 {
  \egreg_catchfiledef:cnn { l_egreg_ #1 _list_tl } { #2 } { \char_set_catcode_other:N \_ }
 }
\NewDocumentCommand{\printtable}{ m m } % #1 = number of columns, #2 = symbolic name
 {
  \egreg_printtable:nn { #1 } { #2 }
 }

\tl_new:N \l_egreg_temp_table_tl
\seq_new:N \l_egreg_temp_list_seq
\int_new:N \l_egreg_col_count_int

\cs_new_protected:Npn \egreg_printtable:nn #1 #2
 {
  \seq_set_split:Nnv \l_egreg_temp_list_seq { , } { l_egreg_ #2 _list_tl }
  \tl_clear:N \l_egreg_temp_table_tl
  \int_zero:N \l_egreg_col_count_int
  \seq_map_inline:Nn \l_egreg_temp_list_seq
   {
    \int_incr:N \l_egreg_col_count_int
    \tl_put_right:Nn \l_egreg_temp_table_tl { ##1 }
    \int_compare:nTF { \l_egreg_col_count_int = #1 }
      { \tl_put_right:Nn \l_egreg_temp_table_tl { \\ } \int_zero:N \l_egreg_col_count_int }
      { \tl_put_right:Nn \l_egreg_temp_table_tl { & } }
   }
  \begin{tabular}{*{#1}{c}}
  \l_egreg_temp_table_tl
  \end{tabular}
 }
\ExplSyntaxOff

\begin{document}

\storefile{words}{my_words.txt}

\noindent Three words per line

\noindent\printtable{3}{words}

\bigskip

\noindent Five words per line

\noindent\printtable{5}{words}

\bigskip

\noindent Eight words per line

\noindent\printtable{8}{words}

\end{document}

The \storefile stores the contents of the file under a symbolic name (in this case words). Then \printtable can print the table using the data.

enter image description here

We first define an expl3 equivalent to \CatchFileDef for being able to generate variants for it. Then the \storefile command is defined: it takes a symbolic name for the word list and the file name as arguments. It stores the entire file in a token list variable (underscores will be absorbed as printable characters).

The \printtable gets the desired number of columns and the symbolic name of the list as arguments. It's, as usual, simply translated into a function call.

The function splits the token list corresponding to the symbolic name at commas and does a mapping on the sequence thus obtained. At each step we increment a counter and add the word to a temporary token list; if the counter's value is not the number of columns we add also a &, otherwise we add \\ and reset the counter to zero.

Finally, the entire list is printed inside a tabular. You might want to load longtable and change tabular into longtable if the word list is long.

For choosing between "row" and "column" order, you can modify it like this:

\documentclass{article}
\usepackage[T1]{fontenc} % for the underscore
\usepackage{catchfile,multicol}
\usepackage{xparse}

\ExplSyntaxOn
\cs_set_eq:NN \egreg_catchfiledef:nnn \CatchFileDef
\cs_generate_variant:Nn \egreg_catchfiledef:nnn { c }
\cs_generate_variant:Nn \seq_set_split:Nnn { Nnv }

\NewDocumentCommand{\storefile}{ m m } % #1 = symbolic name, #2 = file name
 {
  \egreg_catchfiledef:cnn { l_egreg_ #1 _list_tl } { #2 } { \char_set_catcode_other:N \_ }
 }
\NewDocumentCommand{\printtable}{ s m m } % #2 = number of columns, #3 = symbolic name
 {
  \IfBooleanTF{#1}
   { \egreg_printtablev:nn { #2 } { #3 } }
   { \egreg_printtable:nn { #2 } { #3 } }
 }

\tl_new:N \l_egreg_temp_table_tl
\seq_new:N \l_egreg_temp_list_seq
\int_new:N \l_egreg_col_count_int

\cs_new_protected:Npn \egreg_printtable:nn #1 #2
 {
  \seq_set_split:Nnv \l_egreg_temp_list_seq { , } { l_egreg_ #2 _list_tl }
  \tl_clear:N \l_egreg_temp_table_tl
  \int_zero:N \l_egreg_col_count_int
  \seq_map_inline:Nn \l_egreg_temp_list_seq
   {
    \int_incr:N \l_egreg_col_count_int
    \tl_put_right:Nn \l_egreg_temp_table_tl { ##1 }
    \int_compare:nTF { \l_egreg_col_count_int = #1 }
      { \tl_put_right:Nn \l_egreg_temp_table_tl { \\ } \int_zero:N \l_egreg_col_count_int }
      { \tl_put_right:Nn \l_egreg_temp_table_tl { & } }
   }
  \begin{tabular}{*{#1}{c}}
  \l_egreg_temp_table_tl
  \end{tabular}
 }

\cs_new_protected:Npn \egreg_printtablev:nn #1 #2
 {
  \seq_set_split:Nnv \l_egreg_temp_list_seq { , } { l_egreg_ #2 _list_tl }
  \begin{multicols}{3}\centering
  \seq_map_inline:Nn \l_egreg_temp_list_seq { ##1 \par }
  \end{multicols}
 }
\ExplSyntaxOff

\begin{document}

\storefile{words}{my_words.txt}

\noindent Three words per line

\noindent\printtable{3}{words}

\bigskip

\noindent Five words per line

\noindent\printtable{5}{words}

\bigskip

\noindent Three equispaced columns (column order)

\noindent\printtable*{14}{words}

\end{document}

Related Solutions

[Tex/LaTex] Formatting complex table from CSV using datatool

Taking Alan's example and modifying it, you can use the various calculation functions in datatool to do something like

\documentclass{article}
\usepackage{booktabs,datatool}
\usepackage[margin=1in]{geometry}
\DTLloaddb{stores}{stores.csv}
\newcommand*\calcpercent[1]{%
  \DTLdiv{\tmp}{#1}{\subtotal}%
  \DTLmul{\tmp}{\tmp}{100}%
  \DTLround{\tmp}{\tmp}{1}%
  \tmp\,\%
}
\def\total{0}
\DTLforeach{stores}{\subtotal=Sub Total}{\DTLadd{\total}{\total}{\subtotal}}
\begin{document}
\begin{tabular}{llllllll}
  Header row
  \DTLforeach{stores}{%
    \store=Store,%
    \one=Product 1,%
    \two=Product 2,%
    \three=Product 3,%
    \four=Product 4,%
    \five=Product 5,%
    \subtotal=Sub Total%
  }{%
    \\
    \store & \one & \two & \three & \four & \five & \subtotal 
    &
       \DTLdiv{\tmp}{\subtotal}{\total}%
       \DTLmul{\tmp}{\tmp}{100}%
       \DTLround{\tmp}{\tmp}{1}%
       \tmp\,\%
       \\
     & \calcpercent{\one}
     & \calcpercent{\two}
     & \calcpercent{\three}
     & \calcpercent{\four}
     & \calcpercent{\five}
  }\\
\end{tabular}
\end{document}

I've not done any formatting here, but the general idea should be clear. (I'd also note that LaTeX is a typesetting system: if you need to do lots of processing, consider a script tool such as Perl, Python or Lua to pre-process the input .csv into a modified one containing the results.)

[Tex/LaTex] Custom table from csv file (multiple header rows)

Thank you @TonioElGringo, that does exactly what I was looking for. Here an example for a 3 column .csv file.

\documentclass{article}
\usepackage{datatool}
\usepackage[margin=1in]{geometry}
\begin{document}
\DTLloaddb{stores}{scientists.csv}
\renewcommand{\dtldisplaystarttab}{\multicolumn{3}{c}{I am what I was looking for!}\\}
\DTLdisplaydb{stores}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] Formatting complex table from CSV using datatool

[Tex/LaTex] Custom table from csv file (multiple header rows)

Related Question