[Tex/LaTex] How to add/check missing punctuation marks (itemize/enumerate) automatically

itemizeluatext manipulation

(An experiment) I'd like to present you one of the possible tasks (to be automated) I solve in my daily work: adding missing punctuation marks in the itemize and enumerate environments.

I polish mathematical papers, it requires a lot of time and mental energy, therefore anything which could be automated is worth gold, moreoever, it minimizes a number of additional human errors (mistyping, overlooking a critical spot etc.).

I try to automate as many things as possible, but I haven't touched this topic, yet. I wish to automatically check the end of \items where we expect a punctuation mark. There are certain exceptions (no punctuation at all) or there could be a conjunction and (likely some other terms), but it's not as boring task as it looks when it gets to a try to program it. In this question, I introduce the problem and one possible starting point. Presented snippet outside the TeX world (Lua) is not used in production as I'm still thinking what's the best approach here.

The first idea is to remove all the punctuation marks in the itemize and enumerate environments and add punctuations according to our rules, let's say:

Add full stop if \item starts with a uppercase letter or a mathematical expression.
Add comma or semicolon if it doesn't.
Exception: Add fullstop at the end of the last \item as it ends the item list.

It might work and it isn't that difficult to program it, but that's not a good strategy. We would be changing the original paper almost without thinking and considering special cases and following text outside the environment.

It seems that better strategy is not to change a single letter in the paper, but notify a user (us) if there might be a problem. Let me demonstrate a couple of (fictive) examples (mal-itemize.tex).

% *latex mal-itemize.tex
% A testing file...
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt

% For testing purposes...
\usepackage{xcolor}
\def\itsOK{{\color{green}IT'S OK!}}
\def\notOK{{\color{red}NOT OK!}}

\begin{document}
Text before.
\begin{itemize}
\item First 
    item.
\item Second 

   item $a+b+c.$
   \begin{itemize}
   \item 2a % .
   \item 2b, and % ;
   \item 2c % ,
   \end{itemize}
\item Third item

still third item\end{itemize}
text in the middle.
\begin{enumerate}
\item Fourth item;
\item Fifth item $\left(\frac{a}{b}\right .$
\item Sixth item $d+e  .  $
\item Seventh item,
\end{enumerate}

A common situation:
\begin{itemize}
\item 8th item
\item 9th item,
\item 10th item
\item 11th item,
\item 12th item.
\end{itemize}
Text after.
\end{document}

An input TeX file

First and second item looks correct. Full stop ends item directly (first item) or from within a mathematical expression (second item).
2a and 2c might be correct, but then 2b should probably not contain comma, or items 2a and 2c are not correct (more likely) and we should add comma (2a) and full stop (2c).
Third item looks correct if text in the middle is a part of that item. Otherwise, there should be a full stop (as in the first and second items) and text should be changed to Text.
Punctuation mark at the end of the fourth item looks correct as long we don't check the other items. Fifth item is a tricky part as it contains \right. where full stop is a part of the TeX expression.
Seventh item is tricky too. We rather expect a full stop (as following text starts with an uppercase letter), or A common should be changed to a common.
Now, let me demonstrate a common and frequent situation I face most of the time. We just miss punctuation. It would be easy to add comma in the 10th item as we could store punctuation mark from the previous item (9th item). A difficult part is a missing punctuation mark in the 8th item where we need to look forward in the text which type of punctuation mark author used (comma, semicolon, or full stop, skipping and, or, …).

Let me demonstrate my first try in Lua. I'm not fluent in LPeg, so I changed \begins and \ends of the preselected environments to a single character. After that, I can use %b (balanced) operator in the string.gsub command.

Next thing I did is that I doubled the \item command, then we can find the content of \item until the next \item command. It means that there is still another \item which hasn't been swallowed by that regular expression. I also added one \item before the end of the environment, to be able to find the content of the last \item. It's a small trick using regular expressions. Next thing I had to do was to delete TeX comments for purpose of testing an item content. TeX comments cannot affect the results.

The core of the snippet is that I check punctuation marks and and conjunction (there might be also or, neither…) by deleting a portion in that temporary string. If there is a change (it contains that part), that item is correct. I check the punctuation mark itself plus that string followed by an ending of common mathematical expressions.

The snippet informs about progress in the terminal (the content of likely incorrect items) and for purpose of this question it saves a modified version of the TeX file (mal-output.tex). I enclose the source codes and a preview of this testing file. We run (any LaTeX engine can be used):

texlua mal-item.lua
lualatex mal-itemize.tex
lualatex mal-output.tex

We get a list of items with likely missing punctuation marks in the terminal:

 2a %.

 2c %,

 Third item

still third item
 8th item

 10th item

 Fifth item $\left(\frac{a}{b}\right.$

As we can see, this snippet would be working in common situation (8th and 10th items), but it's difficult to say if 2a, 2c and third items are really not correct. On the other hand, fourth, sixth and seventh items are probably incorrect as they contain different punctuation marks within an environment.

Still, I'm wondering if there is a better approach than my humble try.

The mal-item.lua snippet:

-- Lua snippet checks the end of the \item commands for missing punctuations.

file=io.open("mal-itemize.tex", "r")
content=file:read("*all")
file:close()
--print(content) -- print an original content

thecore="([%.,;])" -- punctuation in a group
beginchar="\002" -- starting character, temporary character, %b{}
endchar="\003" -- ending character, temporary character, %b{}
testcases={"%.", ",", ";", "and"} -- which characters and words I would like to have at the end of \item


-- The main loop for different environments...
environ={"itemize", "enumerate"}
for _,environment in pairs(environ) do
begintext="\\begin{"..environment.."}"
endtext="\\end{"..environment.."}"

-- shrink more letters to a single one, for purpose of string.gsub, %b operator
content=string.gsub(content, begintext, beginchar)
content=string.gsub(content, endtext, endchar)

content=string.gsub(content, "(%b"..beginchar..endchar..")", 
   function(malstring)
   malstring=string.gsub(malstring, "%s+"..thecore, "%1") -- delete spaces before punctuation marks
   malstring=string.gsub(malstring, thecore.."  +", "%1 ") -- shrink more spaces after punctuation to one

   malstring=string.gsub(malstring, "(\\item)(%A)", "%1%1%2") -- doubling \item
   --malstring=string.gsub(malstring, "\\item\\item", "\\item", 1) -- no need to store the first occurence   
   malstring=string.gsub(malstring, endchar, "\\item"..endchar) -- but add closing \item

   malstring=string.gsub(malstring, "\\item([^\\].-)\\item", function(s) -- find the content of a single \item
      --print(s)
      mals=s
      mals=string.gsub(mals, "[^\\]%%.-\n", "") -- delete TeX comments for purpose of this testing
      mals=string.gsub(mals, "\\right%.", "") -- delete all \right., we don't need them
      mals=string.gsub(mals, "%s+", " ") -- delete almost all white spaces, we don't need them here
      mals=string.gsub(mals, beginchar, "") -- delete end character of inner itemize/enumerate
      mals=string.gsub(mals, "%s+$", "") -- delete extra spaces at the end of a string      
      maltest=mals
      --print(maltest)
      --print()
        for _, test in pairs(testcases) do
            maltest=string.gsub(maltest, test.."$", "") -- character as it is, there cannot be an extra space
            maltest=string.gsub(maltest, test.."%s?%$$", "") -- character just before $
            maltest=string.gsub(maltest, test.."%s?%$%$$", "") -- character just before $$
            maltest=string.gsub(maltest, test.."%s?\\%]$", "") -- character just before \]            
            maltest=string.gsub(maltest, test.."%s?\\end{equation%*?}$", "") -- character just before \end{equation}
            maltest=string.gsub(maltest, test.."%s?\\end{e?q?n?array%*?}$", "") -- character just before \end{eqnarray}
            maltest=string.gsub(maltest, test.."%s?\\end{gather%*?}$", "") -- character just before \end{gather}
            maltest=string.gsub(maltest, test.."%s?\\end{array%*?}$", "") -- character before \end{array}
        end -- for, testcases
      if mals~=maltest then OK="its" else print(s); OK="not" end
      OK=" \\"..OK.."OK{}"
      --print()
      return "\\item"..s..OK.."\\item" -- return string untouched
      end)
   malstring=string.gsub(malstring, "\\item\\item", "\\item") -- return double \item to normal
   malstring=string.gsub(malstring, "\\item"..endchar, endchar)
   return malstring -- don't return anything
end)

-- return environments to normal, due to %b operator
content=string.gsub(content, beginchar, begintext)
content=string.gsub(content, endchar, endtext)
end -- for, environment

-- Minor modification to get file compilable... (inner itemize/enumerate).
-- for purpose of easy spotting (TeX.SX)
for _,environment in pairs(environ) do
content=string.gsub(content, "(%s+\\begin{"..environment.."}%s+)(\\[in][to][st]OK{})", "%2%1")
end -- for, the main loop, environment


-- Print the result of our efforts termin or output file...
--print(content)
file=io.open("mal-output.tex", "w")
file:write(content)
file:close()

The mal-output.tex file:

% *latex mal-itemize.tex
% A testing file...
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt

% For testing purposes...
\usepackage{xcolor}
\def\itsOK{{\color{green}IT'S OK!}}
\def\notOK{{\color{red}NOT OK!}}

\begin{document}
Text before.
\begin{itemize}
\item First 
    item.
 \itsOK{}\item Second 

   item $a+b+c.$\itsOK{}
   \begin{itemize}
    \item 2a %.
    \notOK{}\item 2b, and %;
    \itsOK{}\item 2c %,
    \notOK{}\end{itemize}
\item Third item

still third item \notOK{}\end{itemize}
text in the middle.
\begin{enumerate}
\item Fourth item;
 \itsOK{}\item Fifth item $\left(\frac{a}{b}\right.$
 \notOK{}\item Sixth item $d+e. $
 \itsOK{}\item Seventh item,
 \itsOK{}\end{enumerate}

A common situation:
\begin{itemize}
\item 8th item
 \notOK{}\item 9th item,
 \itsOK{}\item 10th item
 \notOK{}\item 11th item,
 \itsOK{}\item 12th item.
 \itsOK{}\end{itemize}
Text after.
\end{document}

Output of our efforts

Best Answer

Here is my feeble attempt at what you ask, in the form of the environment Qitemize. It has some drawbacks, which I enumerate here:

Nested environments must be set in their own groups (Ack!)
The item is set before the comments are made. Thus, if an item contains a nested environment, the comments that rightly pertain to the punctuation prior to the nested environment do not appear until after the nested environment.
I became too weary to investigate the testing for leading capitalization or math item, which the OP indicated could affect the choice of desired ending punctuation.

That said, there are a number of things I do check for, which I tried to include in the MWE. One can define the desired penultimate item-ending punctuation in \itemender (here set to ;).

Now for the MWE.

\documentclass{article}
\usepackage{listofitems,environ,xcolor}
\def\itemender{;}
\NewEnviron{Qitemize}{%
  \setsepchar{\item/.||,||;||!||?}%
  \readlist\myarg{\BODY}%
  \begin{itemize}
    \foreachitem\z\in\myarg[]{%
     \if\relax\z\relax\else
      \item\z
      \foreachitem\zz\in\myarg[\zcnt]{%
        \ifnum\zzcnt=\listlen\myarg[\zcnt]\relax
          \ifnum\zzcnt=1\relax
            \textcolor{red}{No punctuation in this item}
          \else
            % end-of-line punctuation
            \ifnum\zcnt=\listlen\myarg[]\relax
              % end of last item
              \if.\myargsep[\zcnt,\zzcnt-1]\else
                \textcolor{red}{\ Should end with .}\fi
            \else
              % end of non-last items
              \ifx\space\zz
                % Truly end of the item, only a space follows last punctuation
                \if\itemender\myargsep[\zcnt,\zzcnt-1]\else
                  \textcolor{red}{Should end with \itemender}\fi
              \else
                % Not really the item end...more than space follows final punctuation
                % Test for nested Qitemize
                \expandafter\testnest\zz\endtestnest
              \fi
            \fi
          \fi
        \else
          % mid-line punctuation
        \fi
      }
     \fi
    }%
  \end{itemize}
}
\def\Qitemizename{Qitemize}
\newcommand\testnest[1]{\testnestaux#1\relax\relax}
\def\testnestaux#1#2#3\endtestnest{%
  \ifx\begin#1 
    \def\tmp{#2}%
    \ifx\tmp\Qitemizename
      % NESTED Qitemize.  Test prior punctuation
      \textcolor{red}{Qitemize environment follows last punctuation}
      \if\itemender\myargsep[\zcnt,\zzcnt-1]\else
        \textcolor{red}{\\Should end with \itemender{} prior to this environment}\fi
    \else
      \textcolor{red}{An environment follows last punctuation}
      \if\itemender\myargsep[\zcnt,\zzcnt-1]\else
        \textcolor{red}{\\Should end with \itemender{} prior to this environment}\fi
    \fi%
  \else
    \textcolor{red}{No ending punctuation}
  \fi
}
\begin{document}
\begin{Qitemize}
\item 8th item, ending not here; but here
\item 9th item;
\item 10th item
\item 11th item,
\item 12th item
  {\begin{Qitemize}
  \item a,
  \item b
  \item c.
  \end{Qitemize}}
\item 13th item,
  {\begin{Qitemize}
  \item a,
  \item b;
  \end{Qitemize}}
\item 14th item;
  {\begin{Qitemize}
  \item a;
  \item b.
  \end{Qitemize}}
\item 15th item,
  {\begin{enumerate}
  \item a,
  \item b.
  \end{enumerate}}
\item 16th item!
\end{Qitemize}
\end{document}

Best Answer

Related Solutions

[Tex/LaTex] Enumerate and itemize undefined + captions not working

[Tex/LaTex] Enumerate caption on itemize

Related Question