(An experiment) I'd like to present you one of the possible tasks (to be automated) I solve in my daily work: adding missing punctuation marks in the itemize
and enumerate
environments.
I polish mathematical papers, it requires a lot of time and mental energy, therefore anything which could be automated is worth gold, moreoever, it minimizes a number of additional human errors (mistyping, overlooking a critical spot etc.).
I try to automate as many things as possible, but I haven't touched this topic, yet. I wish to automatically check the end of \item
s where we expect a punctuation mark. There are certain exceptions (no punctuation at all) or there could be a conjunction and
(likely some other terms), but it's not as boring task as it looks when it gets to a try to program it. In this question, I introduce the problem and one possible starting point. Presented snippet outside the TeX world (Lua) is not used in production as I'm still thinking what's the best approach here.
The first idea is to remove all the punctuation marks in the itemize
and enumerate
environments and add punctuations according to our rules, let's say:
- Add full stop if
\item
starts with a uppercase letter or a mathematical expression. - Add comma or semicolon if it doesn't.
- Exception: Add fullstop at the end of the last
\item
as it ends the item list.
It might work and it isn't that difficult to program it, but that's not a good strategy. We would be changing the original paper almost without thinking and considering special cases and following text outside the environment.
It seems that better strategy is not to change a single letter in the paper, but notify a user (us) if there might be a problem. Let me demonstrate a couple of (fictive) examples (mal-itemize.tex
).
% *latex mal-itemize.tex
% A testing file...
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt
% For testing purposes...
\usepackage{xcolor}
\def\itsOK{{\color{green}IT'S OK!}}
\def\notOK{{\color{red}NOT OK!}}
\begin{document}
Text before.
\begin{itemize}
\item First
item.
\item Second
item $a+b+c.$
\begin{itemize}
\item 2a % .
\item 2b, and % ;
\item 2c % ,
\end{itemize}
\item Third item
still third item\end{itemize}
text in the middle.
\begin{enumerate}
\item Fourth item;
\item Fifth item $\left(\frac{a}{b}\right .$
\item Sixth item $d+e . $
\item Seventh item,
\end{enumerate}
A common situation:
\begin{itemize}
\item 8th item
\item 9th item,
\item 10th item
\item 11th item,
\item 12th item.
\end{itemize}
Text after.
\end{document}
- First and second item looks correct. Full stop ends item directly (first item) or from within a mathematical expression (second item).
- 2a and 2c might be correct, but then 2b should probably not contain comma, or items 2a and 2c are not correct (more likely) and we should add comma (2a) and full stop (2c).
- Third item looks correct if
text in the middle
is a part of that item. Otherwise, there should be a full stop (as in the first and second items) andtext
should be changed toText
. - Punctuation mark at the end of the fourth item looks correct as long we don't check the other items. Fifth item is a tricky part as it contains
\right.
where full stop is a part of the TeX expression. - Seventh item is tricky too. We rather expect a full stop (as following text starts with an uppercase letter), or
A common
should be changed toa common
. - Now, let me demonstrate a common and frequent situation I face most of the time. We just miss punctuation. It would be easy to add comma in the 10th item as we could store punctuation mark from the previous item (9th item). A difficult part is a missing punctuation mark in the 8th item where we need to look forward in the text which type of punctuation mark author used (comma, semicolon, or full stop, skipping
and
,or
, …).
Let me demonstrate my first try in Lua. I'm not fluent in LPeg, so I changed \begin
s and \end
s of the preselected environments to a single character. After that, I can use %b
(balanced) operator in the string.gsub
command.
Next thing I did is that I doubled the \item
command, then we can find the content of \item
until the next \item
command. It means that there is still another \item
which hasn't been swallowed by that regular expression. I also added one \item
before the end of the environment, to be able to find the content of the last \item
. It's a small trick using regular expressions. Next thing I had to do was to delete TeX comments for purpose of testing an item content. TeX comments cannot affect the results.
The core of the snippet is that I check punctuation marks and and
conjunction (there might be also or
, neither
…) by deleting a portion in that temporary string. If there is a change (it contains that part), that item is correct. I check the punctuation mark itself plus that string followed by an ending of common mathematical expressions.
The snippet informs about progress in the terminal (the content of likely incorrect items) and for purpose of this question it saves a modified version of the TeX file (mal-output.tex
). I enclose the source codes and a preview of this testing file. We run (any LaTeX engine can be used):
texlua mal-item.lua
lualatex mal-itemize.tex
lualatex mal-output.tex
We get a list of items with likely missing punctuation marks in the terminal:
2a %.
2c %,
Third item
still third item
8th item
10th item
Fifth item $\left(\frac{a}{b}\right.$
As we can see, this snippet would be working in common situation (8th and 10th items), but it's difficult to say if 2a, 2c and third items are really not correct. On the other hand, fourth, sixth and seventh items are probably incorrect as they contain different punctuation marks within an environment.
Still, I'm wondering if there is a better approach than my humble try.
The mal-item.lua
snippet:
-- Lua snippet checks the end of the \item commands for missing punctuations.
file=io.open("mal-itemize.tex", "r")
content=file:read("*all")
file:close()
--print(content) -- print an original content
thecore="([%.,;])" -- punctuation in a group
beginchar="\002" -- starting character, temporary character, %b{}
endchar="\003" -- ending character, temporary character, %b{}
testcases={"%.", ",", ";", "and"} -- which characters and words I would like to have at the end of \item
-- The main loop for different environments...
environ={"itemize", "enumerate"}
for _,environment in pairs(environ) do
begintext="\\begin{"..environment.."}"
endtext="\\end{"..environment.."}"
-- shrink more letters to a single one, for purpose of string.gsub, %b operator
content=string.gsub(content, begintext, beginchar)
content=string.gsub(content, endtext, endchar)
content=string.gsub(content, "(%b"..beginchar..endchar..")",
function(malstring)
malstring=string.gsub(malstring, "%s+"..thecore, "%1") -- delete spaces before punctuation marks
malstring=string.gsub(malstring, thecore.." +", "%1 ") -- shrink more spaces after punctuation to one
malstring=string.gsub(malstring, "(\\item)(%A)", "%1%1%2") -- doubling \item
--malstring=string.gsub(malstring, "\\item\\item", "\\item", 1) -- no need to store the first occurence
malstring=string.gsub(malstring, endchar, "\\item"..endchar) -- but add closing \item
malstring=string.gsub(malstring, "\\item([^\\].-)\\item", function(s) -- find the content of a single \item
--print(s)
mals=s
mals=string.gsub(mals, "[^\\]%%.-\n", "") -- delete TeX comments for purpose of this testing
mals=string.gsub(mals, "\\right%.", "") -- delete all \right., we don't need them
mals=string.gsub(mals, "%s+", " ") -- delete almost all white spaces, we don't need them here
mals=string.gsub(mals, beginchar, "") -- delete end character of inner itemize/enumerate
mals=string.gsub(mals, "%s+$", "") -- delete extra spaces at the end of a string
maltest=mals
--print(maltest)
--print()
for _, test in pairs(testcases) do
maltest=string.gsub(maltest, test.."$", "") -- character as it is, there cannot be an extra space
maltest=string.gsub(maltest, test.."%s?%$$", "") -- character just before $
maltest=string.gsub(maltest, test.."%s?%$%$$", "") -- character just before $$
maltest=string.gsub(maltest, test.."%s?\\%]$", "") -- character just before \]
maltest=string.gsub(maltest, test.."%s?\\end{equation%*?}$", "") -- character just before \end{equation}
maltest=string.gsub(maltest, test.."%s?\\end{e?q?n?array%*?}$", "") -- character just before \end{eqnarray}
maltest=string.gsub(maltest, test.."%s?\\end{gather%*?}$", "") -- character just before \end{gather}
maltest=string.gsub(maltest, test.."%s?\\end{array%*?}$", "") -- character before \end{array}
end -- for, testcases
if mals~=maltest then OK="its" else print(s); OK="not" end
OK=" \\"..OK.."OK{}"
--print()
return "\\item"..s..OK.."\\item" -- return string untouched
end)
malstring=string.gsub(malstring, "\\item\\item", "\\item") -- return double \item to normal
malstring=string.gsub(malstring, "\\item"..endchar, endchar)
return malstring -- don't return anything
end)
-- return environments to normal, due to %b operator
content=string.gsub(content, beginchar, begintext)
content=string.gsub(content, endchar, endtext)
end -- for, environment
-- Minor modification to get file compilable... (inner itemize/enumerate).
-- for purpose of easy spotting (TeX.SX)
for _,environment in pairs(environ) do
content=string.gsub(content, "(%s+\\begin{"..environment.."}%s+)(\\[in][to][st]OK{})", "%2%1")
end -- for, the main loop, environment
-- Print the result of our efforts termin or output file...
--print(content)
file=io.open("mal-output.tex", "w")
file:write(content)
file:close()
The mal-output.tex
file:
% *latex mal-itemize.tex
% A testing file...
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt
% For testing purposes...
\usepackage{xcolor}
\def\itsOK{{\color{green}IT'S OK!}}
\def\notOK{{\color{red}NOT OK!}}
\begin{document}
Text before.
\begin{itemize}
\item First
item.
\itsOK{}\item Second
item $a+b+c.$\itsOK{}
\begin{itemize}
\item 2a %.
\notOK{}\item 2b, and %;
\itsOK{}\item 2c %,
\notOK{}\end{itemize}
\item Third item
still third item \notOK{}\end{itemize}
text in the middle.
\begin{enumerate}
\item Fourth item;
\itsOK{}\item Fifth item $\left(\frac{a}{b}\right.$
\notOK{}\item Sixth item $d+e. $
\itsOK{}\item Seventh item,
\itsOK{}\end{enumerate}
A common situation:
\begin{itemize}
\item 8th item
\notOK{}\item 9th item,
\itsOK{}\item 10th item
\notOK{}\item 11th item,
\itsOK{}\item 12th item.
\itsOK{}\end{itemize}
Text after.
\end{document}
Best Answer
Here is my feeble attempt at what you ask, in the form of the environment
Qitemize
. It has some drawbacks, which I enumerate here:Nested environments must be set in their own groups (Ack!)
The item is set before the comments are made. Thus, if an item contains a nested environment, the comments that rightly pertain to the punctuation prior to the nested environment do not appear until after the nested environment.
I became too weary to investigate the testing for leading capitalization or math item, which the OP indicated could affect the choice of desired ending punctuation.
That said, there are a number of things I do check for, which I tried to include in the MWE. One can define the desired penultimate item-ending punctuation in
\itemender
(here set to;
).Now for the MWE.