[Tex/LaTex] Static analysis of LaTeX documents

compilingerrors

For programming languages like C, C++, C#, Java and so on there always exist some kind of static analysers, tools that check the sourcecode for unreachable code, unused variables, memory leaks and other kind of stuff which is not relevant for compiling (and so the compilers don't report or notice it at all).

Are there similar tools for LaTeX?

Of course, there are some packages, like onlyamsmath or nag that check for outdated macros or packages, and refcheck that checks for unused labels. But are there packages and/or tools that check for or hint to

  • unreachable code: \if\else\fi constructions where one or more paths can never be reached?
  • inefficient loops, like \foreach's that could be simplified?
  • unused macro defintions?
  • suggesting the use of \newcommand* instead of \newcommand where appropiate?
  • suspicious lack of possible brackets or whitespaces? Like:
    1. a^b c – clear
    2. a^{bc} – clear
    3. a^bc – suspicious: renders like 1. but maybe 2. was intended?
  • suspicious empty lines (paragraphs)? For example between text and equation?
  • suspicious or missing end line comments?
  • whatever else you can think of that is very often wrong or inefficient or unclear to the human eye?

Best Answer

Short answer is that it's not possible.

There are some tools that do some things but they can not really analyse the latex document and so any advice they give should only be taken as hints, it might be wrong.

The big difference between LaTeX and the languages that you mention like C and Java is that the syntax of LaTeX can not be analysed, even the basic lexical analysis and tokenisation of the input depends on run time behaviour.

\section[abc}

Looks like it might be a syntax error that you might expect a static analysis to pick up but the document might be

\documentclass{article}

\ifodd\time\catcode`[1\fi
\begin{document}

\section[abc}

aa
\end{document}

which means that it is or is not a valid document depending on the number of minutes since midnight. This is obviously an extreme case but not as extreme as you may think. Lots of packages do similar things that change the analysis of the document, think of babel shorthands for example. The fact that babel has been loaded can be statically detected by inspecting the preamble, but determining which language is in force at any point really requires running a full LaTeX interpreter.

Even if it were possible I'd question if some of your items really should be flagged.

  • unreachable code: \if\else\fi constructions where one or more paths can never be reached?

The difficulty here is determining which tokens are in fact tests, mostly you do not see Tex primitives such as \if But tokens defined via \newif which are harder to recognise by a checker. It could perhaps assume that every token starting \if.. is an if token in this sense but for example LaTeX \ifthenelse starts with \if.... but has a very different syntax.

  • inefficient loops, like \foreach's that could be simplified?

\foreach is simply a macro so almost by definition any particular use of it can be simplified by expanding out the macro. But that may not be seen as simplification...

  • unused macro defintions?

LaTeX and all its packages are macro definitions and most documents don't use most of the commands defined, so there are typically thousands of unused macros in any given document.

  • suggesting the use of \newcommand* instead of \newcommand where appropiate?

I'm not sure how this could be done unless you record every use of the macro in a given document and note that it never takes par in that case,

  • suspicious lack of possible brackets or whitespaces? Like:
    1. a^b c - clear
    2. a^{bc} - clear
    3. a^bc - suspicious: renders like 1. but maybe 2. was intended?

I'd disagree with this check. 2. is the standard latex syntax. If you decide to allow 1. then you should allow 3 as well without comment. It's a central part of the design of TeX math mode syntax that white space is not significant other than terminating command names.

  • suspicious empty lines (paragraphs)? For example between text and equation?

TeX goes to some trouble to distinguish the case that the text following a display is or is not a new paragraph, and LaTeX emulates this behaviour for all its list environments. So unless the static analyser is interpreting the sentences and suggesting that it should not be the start of a paragraph it should not be commenting on blank lines.

  • suspicious or missing [end line comments][1]?

Yes, so long as it can recognise the start of latex3 syntax or similar packages that change the rules and mean % is not necessary.

  • whatever else you can think of that is very often wrong or inefficient or unclear to the human eye?

getting a human to proofread the document is a good idea, human eyes are still better at this than machines:-)

Related Question