Sweave – Dynamically Count and Return Number of Words in a Section

parsingsweaveword count

I am writing a proposal for which it is required to give the number of words of the main section in the proposal itself. I found this answer on how I can get a word count on a LaTeX document:

Is there any way to do a correct word count of a LaTeX document?

But this is only for entire documents. I was wondering if it is possible to get a word count in a LaTeX document of a section/part of text and actually return this in the text as well.

I am open to suggestions with Sweave. One thing I could think of is to write the entire section in a Sweave block in an R character string. Then simply get the word count from that and return the string. This would however include all LaTeX codes used and not only intended words. Another solution I thought of was to extend this R code to wrap the string in an empty document, compile that and use tools available to get a wordcount from that, then return it. This seems like it could work but I am hoping there is a simpler solution.

Best Answer

You can use texcount to count the words. It automatically produces subcounts for the sections.

Here's a new macro that calls texcount, extracts the subcount for the current section, and then inserts the word count into the document. It requires write18 to be enabled, and texcount must be in your path (or you have to include the full path to the executable in the macro).

\documentclass{article}
\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

\begin{document}
\section{Introduction}
In publishing and graphic design, lorem ipsum is placeholder text (filler text) commonly used to demonstrate the graphics elements of a document or visual presentation, such as font, typography, and layout. The lorem ipsum text is typically a section of a Latin text by Cicero with words altered, added and removed that make it nonsensical in meaning and not proper Latin.

\wordcount
\section{Main Stuff}
Even though "lorem ipsum" may arouse curiosity because of its resemblance to classical Latin, it is not intended to have meaning. Where text is comprehensible in a document, people tend to focus on the textual content rather than upon overall presentation, so publishers use lorem ipsum when displaying a typeface or design elements and page layout in order to direct the focus to the publication style and not the meaning of the text. In spite of its basis in Latin, use of lorem ipsum is often referred to as greeking, from the phrase "it's all Greek to me," which indicates that this is not meant to be readable text.

 \wordcount
\section{Conclusion}
Today's popular version of lorem ipsum was first created for Aldus Corporation's first desktop publishing program Aldus PageMaker in the mid-1980s for the Apple Macintosh. Art director Laura Perry adapted older forms of the lorem text from typography samples — it was, for example, widely used in Letraset catalogs in the 1960s and 1970s (anecdotes suggest that the original use of the "Lorem ipsum" text was by Letraset, which was used for print layouts by advertising agencies as early as the 1970s.) The text was frequently used in PageMaker templates.

\wordcount
\end{document}

Related Question