[Tex/LaTex] Dynamically count words in chapter and insert word count at start of chapter

atomchaptersreporttexcountword count

TL;DR Einar's first alternative solution of making a new input command that automatically generates counts at the end of each input section is what I used to generate counts at the end of each chapter:

\newcommand\countinput[1]{
\input{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE \#1 CONTAINS \input{count.txt} WORDS}
}

However, this creates a new command and means counting the entire doc is then harder because you're using 'countinput' instead of 'input'; but for now that's fine for my purposes and my problem of counting each chapter is solved in the short term.

Meanwhile, nevermind the need to enable write18 … you need to make sure you've done --shell-escape (not --shell-escapee as Einar wrote and I kept trying, urggg!) or checked the following box if you're using Atom:

enter image description here

Question:

I am trying to adapt this suggestion of how to dynamically count words in sections, but I want to use it for a large report format doc (a thesis) where it displays a count of the words in each chapter (and at the start of the chapter, if possible).

Their solution proposes this using texcount:

\documentclass{article}
\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

\begin{document}
\section{Introduction}
In publishing and graphic design, lorem ipsum is placeholder text (filler text) commonly used to demonstrate the graphics elements of a document or visual presentation, such as font, typography, and layout. The lorem ipsum text is typically a section of a Latin text by Cicero with words altered, added and removed that make it nonsensical in meaning and not proper Latin.

\wordcount
\section{Main Stuff}
Even though "lorem ipsum" may arouse curiosity because of its resemblance to classical Latin, it is not intended to have meaning. Where text is comprehensible in a document, people tend to focus on the textual content rather than upon overall presentation, so publishers use lorem ipsum when displaying a typeface or design elements and page layout in order to direct the focus to the publication style and not the meaning of the text. In spite of its basis in Latin, use of lorem ipsum is often referred to as greeking, from the phrase "it's all Greek to me," which indicates that this is not meant to be readable text.

 \wordcount
\section{Conclusion}
Today's popular version of lorem ipsum was first created for Aldus Corporation's first desktop publishing program Aldus PageMaker in the mid-1980s for the Apple Macintosh. Art director Laura Perry adapted older forms of the lorem text from typography samples — it was, for example, widely used in Letraset catalogs in the 1960s and 1970s (anecdotes suggest that the original use of the "Lorem ipsum" text was by Letraset, which was used for print layouts by advertising agencies as early as the 1970s.) The text was frequently used in PageMaker templates.

\wordcount
\end{document}

However, my document has chapters where the main tex file works as follows:

\documentclass{report}
\newcommand\wordcount{
    \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

\begin{document}

\chapter{Introduction}
\input{sections/introduction}
\wordcount

\chapter{Many Chapters}
\input{sections/chapters}
\wordcount

\chapter{Conclusion}
\input{sections/conclusion}
\wordcount

Now I know that the \wordcount command should probably sit inside the chapter files themselves, and at the start of the file if that's where I want it, but either way it fails.

And, it fails because "File 'count.tex' not found."

If it is relevant, I use atom with latex package.

Is there any way I can adapt @Jake's solution to suite my use case?

EDIT:

Following Einer's solution, of using this in a magic comment

--enable-write18

I can now get the module to count the words in the main tex file (not many, just abstract), but I still can't seem to get it to work for specific chapters or sections as per original need. If I do the original method above it just says

( words)

in the output file, like this:

like this

EDIT 2: Just to clarify, my current code still generates the blank count output above (I am calling \wordcount at the end of each chapter file, e.g. chapter1.tex, which is called from main.tex via \input{sections/chapter1.tex}), and my code is:

In the main.tex file:

% !TEX --enable-write18

\documentclass{report}

\newcommand\wordcount{
    \immediate\write18{texcount -merge -sub=section  \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

\begin{document}

\chapter{Introduction}
\input{sections/introduction.tex}

\chapter{Many Chapters}
\input{sections/chapter1.tex}

\chapter{Conclusion}
\input{sections/conclusion.tex}

At the end of each chapter, e.g. chapter1.tex:

A little bit of intro text.

\section{Section 1}

Lorem ipsum la la la and all that.

\section{Section 2}

Lorem ipsum la la la and all that. Except this section might be longer, more lorem ipsum dolor lalala sit amet, and so on.

\wordcount

However, this does not produce a count (blank again as per above), nor does it work with:

\newcommand\wordcount{
    \immediate\write18{texcount -inc -brief -sub=section  \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
(\input{count.txt}words)}

Therefore… it seems everything works with texcount but the problem is all in my \newcommand settings… and in essence, I am back to my original question — how do I setup this \newcommand to call texcount to count the text in the chapter (e.g. chapter1.tex) and display it in the chapter?

EDIT 3:

A big problem is that the merge command isn't including or counting the content of the \input chapter files, only their chapter titles, e.g. output of standard texcount on main.tex is:

File: main.tex
Encoding: ascii
Words in text: 333
Words in headers: 30
Words outside text (captions, etc.): 2
Number of headers: 12
Number of floats/tables/figures: 0
Number of math inlines: 0
Number of math displayed: 0
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
0+9+2 (1/0/0/0) _top_
333+1+0 (1/0/0/0) Chapter: Abstract
0+1+0 (1/0/0/0) Chapter: Outline
0+1+0 (1/0/0/0) Chapter: Introduction
0+2+0 (1/0/0/0) Chapter: Some Stuff
0+1+0 (1/0/0/0) Chapter: Conclusion

EDIT 4:

After trying Einer's alternative solutions I am still stumped, mainly because texcount is still NOT counting the input files from the sections directory… for example, the first solution looked promising as:

\newcommand\countinput[1]{
\input{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE #1 CONTAINS \input{count.txt} WORDS}
}

(note I have changed \oldinput{count.txt} to \input{count.txt} because \oldinput was an undefined control sequence)

and

\chapter{Outline}
\countinput{sections/outline}

I still get no word count (and count.txt is produced but remains empty):

enter image description here

If I try the second alternative solution I get bigger problems: "no file b.tex" ?!?!

I am sure the first alternative suggestion would work, if only texcount would actually count the words in files, e.g. "sections/file.tex" etc.

EDIT 5:

The problem appears to be twofold, compounded by an issue with Atom somehow not being able to create and writing to count.txt … seems to be a permissions issue that shell-escapee and enabling write18 don't solve. I am looking into this… but will mark Einar's solution as sound… it essentially works but just not for me 🙁

Best Answer

Assuming the file count.tex is not created, it may be that the command in \write18 is not run, so you might want to ensure that you are actually running TeXcount.

First, for \write18 to execute, LaTeX must be run with a command line option: --enable-write18 or --shell-escapee as explained in the TeXcount FAQ.

Next, you can try running TeXcount without the pipes, and use the option -out=filename to write the output from TeXcount directly to file: eg texcount -out=\jobname.out \jobname.tex to give a minimal example. If this fails, TeXcount is probably not run at all.

Maybe you need to provide the full path to texcount, although I think that would be unlikely to be a problem on Linux.

Is there any information in the LaTeX log? I think it should log that \write18 is being run or not, and perhaps provide some error message if something has gone seriously wrong.


Once you have TeXcount running using \write18, you need to run TeXcount with options that make TeXcount process the included files, and produce statistics either per file or per section depending on you desire.

By default, TeXcount only parses the main file, not files included through \input or \include. To make TeXcount process these, you need to use one of two options: either -inc which makes TeXcount parse each file separately, or -merge which makes it merge the files together and process it as if it was one big file.

What would be closest to the original example would be

texcount -merge -sub=section \jobname.tex

which would merge the files and produce per section summary counts. I think the grep and sed commands might work as in the example with this command: otherwise, it should work with some adjustments.

I recommend running TeXcount from the command line first to see the full output, and then adding the greps and seds to check if they work as desired.

Since your files lie in subfolders, you might want to verify what the value of \thesection is at each relevant point.

A slightly different approach could be to use per-file statistics from TeXcount by running something like

texcount -inc -brief \jobname.tex

which should return one line per file plus one for the total. Potential problems with this approach would be that you'd need the file name (or path) to extract the correct line from the TeXcount output, and that the section headers in your example would be counted as part of the main file rather than the corresponding file.


As a side-note, there are other ways to provide per section counts. One is to run TeXcount on each included file rather than one the whole document and grep out the relevant section. Another is to use a template to customise the output from TeXcount in such a way that it produces LaTeX code: I'll see if I can find or come up with an example of how to do that.

Alternative solution giving counts per file

You can define an alternative file input command which does the counting per file:

\newcommand\countinput[1]{
\input{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE #1 CONTAINS \oldinput{count.txt} WORDS}
}

You may experiment with TeXcount options -1, -sum, -brief to find combinations that give you what you want. There is also the -template option for additional customisation, but that might get a bit more tricky.

You can even redefine the existing \input along these lines:

\let\oldinput=\input
\newcommand\countinput[1]{
\oldinput{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE #1 CONTAINS \oldinput{count.txt} WORDS}
}
\let\input=\countinput

Do note that you now have to use \oldinput instead of \input to include the counts file.

For experimenting, it might be easier to use \verbatiminput from the verbatim package to include the counts file since the counts file tends to contain characters that TeX treats as special characters: eg "#". That way, you can use the full default output from TeXcount with per section counts should you wish.

Do note that the per file counts will not include the chapter headers as those are part of the main file rather than included in the subfiles.

Related Question