[Tex/LaTex] Dynamic word count for abstract environment in LaTeX using Knitr/Sweave/Rstudio

abstractknitrsweavetexcountword count

I am attempting to use TeXcount to count the number of words in my abstract environment and print it out, so that when I update the abstract it prints the new word count.

I did my best to search the various forums, and I found a solution here that works for me when I'm using numbered sections:

Dynamically count and return number of words in a section

which is basically to use this macro:

\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" |     sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

I attempted to change it to -sub=abstract and grep "Abstract", but the output file is empty and it prints as just ( words). The linked strategy works fine and correctly prints the number of words for my main sections, but I can't get it to work with my abstract.

I am using Knitr in Rstudio on OSX if that helps. I'm completely open to different kinds of solutions, including ones that don't involve texcount (though I'd prefer if I could do everything within my LaTeX script, similar to the above-linked solution). I am a long-time stack lurker and this is my first post, so my apologies for any newbie behavior.

Best Answer

Solution using original approach

Your idea of using -sub=abstract is good, but doesn't work since TeXcount doesn't actually recognise the abstract as a separate subsection. While hopefully that functionality will be added at some point, there's a quick-fix to force a new subcount of the abstract using %TC:break {name} to add breakpoints:

%TC:break Abstract
\begin{abstract}
Abstract text comes here...
\end{abstract}
%TC:break main

The names Abstract and main are just arbitrary names. Now TeXcount will produce a subcount for the abstract (even without the -sub options).

It is possible to use grep, sed etc to extract and reformat the output, but it might also be helpful to give TeXcount an output template. Eg if you run TeXcount with the option -template="{sub?{title}: {word}\n?sub}" it will print only the per segment counts on the form title: words. You can use {hword}, {oword}, {sum} etc to insert word count in headers, other places, and total (as defined by the -sum option).

You can even use the template to produce TeX macros to help typeset the word count in the document. More about templates in the next solution which depends on it entirely.

Better solution!

However, there's a nicer solution which avoids having to grep out the abstract count and can allow you to shape the output in a more flexible way.

You can specify a new counter, and then a rule for the abstract environment to use this, by adding the following TeXcount instructions anywhere before the abstract, eg in the preamble:

%TC:newcounter abst Words in abstract
%TC:envir abstract [] abst

This will count words in the abstract separate from other words. I first though of using -sum=... to specify a sum count consisting only of the abstract, but that doesn't work since -sum doesn't really handle new counters very well (to be fixed I hope!).

To get the count for the abstract only, you can use an output template. This can be done in two ways. You can specify the template in the TeXcount command:

texcount -template="{abst}" file.tex

Alternatively, you can specify the template somewhere in the TeX file:

%TC:newtemplate
%TC:template {abst}

In either case, {abst} will be replaced by the value of the abst counter we defined.

You can even use the template to write TeX code which you can include in your document, eg using \WordsInAbstract{{abst} } as a template, but then you may need to run TeXcount with the -tex option to escape special TeX characters in the output. NB: Using {{abst}} in the template may trigger a bug where {abst} is replaced by eg 4, and then {4} gets replaced by the value in the 4th counter (number of headers), which is solved by adding an extra space.

You can also have TeXcount write the output directly to file using the -out=outfile option. Usually not a problem, but there are some cases where > outfile can't be used.

Related Question