Make a list of all words in Latex

lists

Is there a way to make a list of all the words that are being used in a Latex document? Alternatively, if someone knows another way to do it that could also be helpful, e.g. by using Python, a website, or something else

Here is an example of what I would like:

\documentclass{article}
\begin{document}
I have a dog and a cat.
The dog and the cat are named Bob and John.
\end{document} % Should maybe be after the list


list:
I 
have
a 
dog 
and 
cat
the 
are 
named 
bob 
john

The order of the words in the list does not matter.
And thank you if you can help.

Best Answer

For some definition of "word" and "being used" you can extract the text from the PDF and process to a list.

pdflatex file1
pdftotext file1.pdf

will produce file1.txt

I have a dog and a cat. The dog and the cat are named Bob and John.

1

Which you can process with (standard linux utilities that would also be available on windows if needed, actually I am using cygwin versions on windows)

Then

cat file1.txt | tr '[:space:][,.]' '[\n*]' | tr '[:upper:]' '[:lower:]' | sort | uniq

Produces the list:

1
a
and
are
bob
cat
dog
have
i
john
named
the

The long command pipe is doing at each step:

replace white space and punctuation by newline
lowercase the resulting words
sort alphabetically
remove duplicates.

Related Solutions

[Tex/LaTex] How to make itemize/enumerate/description environment robust to missing \item elements

You should create your own myitemize environment that allows you to do this:

enter image description here

\documentclass{article}
\makeatletter
\newenvironment{myitemize}
  {\itemize\@newlistfalse}% \begin{myitemize}
  {\enditemize}% \end{myitemize}
\makeatother
\begin{document}
\noindent Here is some text.
\begin{myitemize}% A list with items
  \item An item
  \item Another item
\end{myitemize}
Here is some more text.
\begin{myitemize}% An empty list
\end{myitemize}
Here is a final piece of text.
\end{document}

The myitemize environment is exactly the same as itemize, except that it (re)sets the boolean condition \@newlistfalse. This is set to true (\@newlisttrue) at the start of a regular list (itemize, enumerate, ...), which causes the error when no \items are used.

This may cause problems with nested lists, although I don't think this fits your use case.

[Tex/LaTex] Remove Bullet Symbols from List

You can use enumerate* instead of itemize* for numbered lists. Also the numbers can be added manually:

\documentclass[11pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage{mdwlist}
\title{Brief Article}
\author{The Author}
\begin{document}
\maketitle

\begin{enumerate*}
\item[2.] Regular vacations and holidays according to the Law.
\item[3.] Absence for performing examinations in accordance with what is
      stated in this Law.
\item[5.] Leave without pay, which is not more than casual 20 days during
the work year.
\end{enumerate*}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] How to make itemize/enumerate/description environment robust to missing \item elements

[Tex/LaTex] Remove Bullet Symbols from List

Related Question