[Tex/LaTex] How to use chktexrc to control warnings on dashes

chktexpunctuationsyntax-checker

I am trying to configure chktex with a local .chktexrc. Per the chktex manual:

You should also take a look at the “chktexrc” file. As it is self-documenting, you
should be able to get the meaning of each keyword by simply reading the file.
In fact, since not all options are described in this documentation it is necessary
that you read the “chktexrc” file in order to understand them all.

Unfortunately, either I'm being dense or they overestimated how self-documenting the section on dashes is. Maybe a little bit of both.

When writing about Plato and Aristotle, you often have references that look like this: 24c–e. Since 24c to 24e is a range, I want an en dash, not a hyphen. However, in its default settings, chktex does not want an en dash there, so I get a warning.

I know that you can suppress the warning for a single line or for the whole file using a LaTeX comment, but I prefer to use a local chktexrc for all my configuration. (Don't repeat myself, and all that.)

The relevant section of the default configuration file is below, and I'm hoping someone can help me understand it. My key questions are these:

  • What do the numbers 1, 2, and 3 represent? (I thought they represented the three cases for type of character on either side of the dashes. 1 = letters on left and right, 2 = numbers on left and right, and 3 = space on left and right. That suggests that I wanted to add 1 to NUMDASH, but that does not suppress the warning.)
  • Why does the comment for NUMDASH say Between words when the example is clearly between numbers?
  • I was able to suppress the warning by putting 1 2 3 in the HYPHDASH setting. Why does that work? (This question is basically the first question but put the other way around.)
#####################################################################
#
# Here,  you  can  specify the length of various dashes.  We sort the
# dash according to which type of characters that are on the left and
# right of it.  We are only conclusive if they are the same.
#
# We associate as follows:
#
#     Name        Type of character on each side
#     HyphDash    Alphabetic (foo-bar)
#     NumDash     Numeric (2--3)
#     WordDash    Space (like this --- see?)
#
# Below you specify how many dashes which are legal in each case.  We
# define 0 as  a magic constant which always generates an error.  You
# may specify more than one legal dash-length.
#
# Let's look at an example.  You use the following dash-syntax:
#
#     foo-bar
#     2--3
#     like this---see?
#
#
# HYPHDASH { 1 3 }        # Either a hyphen, or inter-word
# NUMDASH { 2 }           # Between words
# WORDDASH { 0 }          # We never use this
#

HyphDash
{
    1 3
}

NumDash
{
    2
}

WordDash
{
    3
}

Here's a MWE if you want to check something with chktex.

\documentclass[12pt,letterpaper]{article}

\begin{document}

Plato references are given in Stephanus numbers. They can look like the following: \textit{Apology} 24c--e.

\end{document}

Best Answer

The numbers represent the number of dashes that are permitted in the given context. So in a HyphDash context (an alphabetic character left and right, i.e., where you would normally put a hyphen) you want to allow two dashes:

HyphDash
{
    2
}

This will suppress the warning.

The 'between words' comment therefore can be read as 'if in a numbers context then a between-words dash (i.e., dash of length 2) is allowed'. The HyphDash example then reads 'in a letter context either a hyphen (length 1) or an inter-word dash (length 3) are allowed'.

This can be seen in the source code of ChkTeX, in the file FindErrs.c, comments mine:

// count number of - characters from current buffer position
TmpPtr = BufPtr;
SKIP_AHEAD(TmpPtr, TmpC, TmpC == '-');
TmpCount = TmpPtr - BufPtr + 1;
/* some lines skipped */

// if the character before the dash(es) is a space and the character after is a space
// then we use the WordDash context
if (LATEX_SPACE(*PrePtr) && LATEX_SPACE(*TmpPtr))
   wl = &WordDash;
// if the dash(es) are between digits then we are in the NumDash context
if (isdigit((unsigned char)*PrePtr) && isdigit((unsigned char)*TmpPtr))
   wl = &NumDash;
// if the dash(es) are between alphabetic characters then we are in the HyphDash context
if (isalpha((unsigned char)*PrePtr) && isalpha((unsigned char)*TmpPtr))
   wl = &HyphDash;
// if we are in any of these three contexts
if (wl)
{
   Errored = TRUE;
   // loop the list of numbers for this context from chktexrc
   FORWL(i, *wl)
   {
      // convert to (long) int
      Len = strtol(wl->Stack.Data[i], NULL, 0);
      // no error if the number of dashes found is the current list item
      // i.e., the actual number is in the list
      if (TmpCount == Len)
      {
         Errored = FALSE;
         break;
      }
   }

Note that the global chktexrc file is always loaded, also when you specify a different file on the command line. Therefore if the global file has 1 3 and the local file has 2 then chktex --localrc myrcfile mytexfile.tex will use 1 2 3. To ignore the global file you can use chktex -g0 --localrc myrcfile mytexfile.tex.

Related Question