[Tex/LaTex] Dynamic word count for abstract environment in LaTeX using Knitr/Sweave/Rstudio

abstractknitrsweavetexcountword count

I am attempting to use TeXcount to count the number of words in my abstract environment and print it out, so that when I update the abstract it prints the new word count.

I did my best to search the various forums, and I found a solution here that works for me when I'm using numbered sections:

Dynamically count and return number of words in a section

which is basically to use this macro:

\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" |     sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

I attempted to change it to -sub=abstract and grep "Abstract", but the output file is empty and it prints as just ( words). The linked strategy works fine and correctly prints the number of words for my main sections, but I can't get it to work with my abstract.

I am using Knitr in Rstudio on OSX if that helps. I'm completely open to different kinds of solutions, including ones that don't involve texcount (though I'd prefer if I could do everything within my LaTeX script, similar to the above-linked solution). I am a long-time stack lurker and this is my first post, so my apologies for any newbie behavior.

Best Answer

Solution using original approach

Your idea of using -sub=abstract is good, but doesn't work since TeXcount doesn't actually recognise the abstract as a separate subsection. While hopefully that functionality will be added at some point, there's a quick-fix to force a new subcount of the abstract using %TC:break {name} to add breakpoints:

%TC:break Abstract
\begin{abstract}
Abstract text comes here...
\end{abstract}
%TC:break main

The names Abstract and main are just arbitrary names. Now TeXcount will produce a subcount for the abstract (even without the -sub options).

It is possible to use grep, sed etc to extract and reformat the output, but it might also be helpful to give TeXcount an output template. Eg if you run TeXcount with the option -template="{sub?{title}: {word}\n?sub}" it will print only the per segment counts on the form title: words. You can use {hword}, {oword}, {sum} etc to insert word count in headers, other places, and total (as defined by the -sum option).

You can even use the template to produce TeX macros to help typeset the word count in the document. More about templates in the next solution which depends on it entirely.

Better solution!

However, there's a nicer solution which avoids having to grep out the abstract count and can allow you to shape the output in a more flexible way.

You can specify a new counter, and then a rule for the abstract environment to use this, by adding the following TeXcount instructions anywhere before the abstract, eg in the preamble:

%TC:newcounter abst Words in abstract
%TC:envir abstract [] abst

This will count words in the abstract separate from other words. I first though of using -sum=... to specify a sum count consisting only of the abstract, but that doesn't work since -sum doesn't really handle new counters very well (to be fixed I hope!).

To get the count for the abstract only, you can use an output template. This can be done in two ways. You can specify the template in the TeXcount command:

texcount -template="{abst}" file.tex

Alternatively, you can specify the template somewhere in the TeX file:

%TC:newtemplate
%TC:template {abst}

In either case, {abst} will be replaced by the value of the abst counter we defined.

You can even use the template to write TeX code which you can include in your document, eg using \WordsInAbstract{{abst} } as a template, but then you may need to run TeXcount with the -tex option to escape special TeX characters in the output. NB: Using {{abst}} in the template may trigger a bug where {abst} is replaced by eg 4, and then {4} gets replaced by the value in the 4th counter (number of headers), which is solved by adding an extra space.

You can also have TeXcount write the output directly to file using the -out=outfile option. Usually not a problem, but there are some cases where > outfile can't be used.

Related Solutions

[Tex/LaTex] Dynamically count and return number of words in a section

You can use texcount to count the words. It automatically produces subcounts for the sections.

Here's a new macro that calls texcount, extracts the subcount for the current section, and then inserts the word count into the document. It requires write18 to be enabled, and texcount must be in your path (or you have to include the full path to the executable in the macro).

\documentclass{article}
\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

\begin{document}
\section{Introduction}
In publishing and graphic design, lorem ipsum is placeholder text (filler text) commonly used to demonstrate the graphics elements of a document or visual presentation, such as font, typography, and layout. The lorem ipsum text is typically a section of a Latin text by Cicero with words altered, added and removed that make it nonsensical in meaning and not proper Latin.

\wordcount
\section{Main Stuff}
Even though "lorem ipsum" may arouse curiosity because of its resemblance to classical Latin, it is not intended to have meaning. Where text is comprehensible in a document, people tend to focus on the textual content rather than upon overall presentation, so publishers use lorem ipsum when displaying a typeface or design elements and page layout in order to direct the focus to the publication style and not the meaning of the text. In spite of its basis in Latin, use of lorem ipsum is often referred to as greeking, from the phrase "it's all Greek to me," which indicates that this is not meant to be readable text.

 \wordcount
\section{Conclusion}
Today's popular version of lorem ipsum was first created for Aldus Corporation's first desktop publishing program Aldus PageMaker in the mid-1980s for the Apple Macintosh. Art director Laura Perry adapted older forms of the lorem text from typography samples — it was, for example, widely used in Letraset catalogs in the 1960s and 1970s (anecdotes suggest that the original use of the "Lorem ipsum" text was by Letraset, which was used for print layouts by advertising agencies as early as the 1970s.) The text was frequently used in PageMaker templates.

\wordcount
\end{document}

[Tex/LaTex] Does the wordcount package do a proper wordcount

Since this is too long to fit a comment, I'll make it into an answer.

Also, beware that as the maker of TeXcount I'm not entirely unbiased, although I'll try to give a fair answer.

First, all word counters will have technical limitations as well as choices as to what to count (or not count) as words. This is particularly true of documents that, like many LaTeX documents, are more complex than pure prose.

Technical limitations for LateX based word counts will often be related to the ability to interpret macros and environments.

In addition there will be choices, some of which may be guided by technical issues, as to what to count as a word: formulas, footnotes, titles, captions, citations, etc. Technical limitations aside, there is no unique answer to which is the correct word count: different people and journals will have different opinions, and may even differ from case to case based on the type of manuscript.

The only way to check if any word counter counts what you want it to count is to check it.

My main claim as regards TeXcount is not primarily that it is more accurate than others, since I haven't make a proper/fair comparison and it might anyway depend on the type/writing style of the document and user's preference, but that it enables you to check what it counts as words by providing you with an annotated (colour coded) version of the TeX files.

You can check out TeXcount without any installation using the web service to see how it works on your document. (Note that the connection is by HTTP which means your document will be sent in cleartext over the Internet, so don't use this for confidential material.)

There are ways in TeXcount to tweak the counting by adding additional macro/environment handling rules or mark parts of the text that should be ignored, but the core strength in terms of assessing the accuracy is the annotated output which enables you to check in detail how your document was processed.

Let me add a bit to my original rant to better answer the more technical side of your question.

When I first made TeXcount, it was because I couldn't find a LaTeX word counter that worked on my own LaTeX documents. This was probably due to how my documents were structured. In turn, early version of TeXcount failed similarly on some other people's documents since their LaTeX style differed from what I had prepared TeXcount to handle. Over time TeXcount has hopefully gotten more robust, in part due to feedback from users over several years.

TeXcount, as you note, uses a Perl script to parse the file(s), and does not actually run any of the TeX macros. As such, text produced by macros will normally not be counted. You can specify that a macro, eg \LaTeX, represents a word, but that is basically it. Any word counter that does not actually run TeX will face similar limitations.

Word counters that actually run TeX may be able to count text generated by macros, but run into the limitation of being implemented in TeX. It may become fragile, as I experienced on my own documents when they failed to run properly, or become less flexible in controlling which parts/contexts of the document should be counted.

One of the main risks for any TeX file word counter is that large parts of the document gets ignored or misinterpreted due to some technical issue which is not immediately obvious to the user. Next will often be systematic errors (or biases) in how it counts special cases (dashed terms, numbers, formula, special characters, etc).

The final answer is that the best way to find out is to check what actually gets counted.

Best Answer

Related Solutions

[Tex/LaTex] Dynamically count and return number of words in a section

[Tex/LaTex] Does the wordcount package do a proper wordcount

Related Question