Hooks – Adding a Command Hook to \appendix with Cleveref

hookslthooks

I'm trying to set a hook to the \appendix command using the new hook management system. However, when cleveref is loaded, the hook fails with error "Illegal parameter number in definition of \appendix." at begindocument, hence presumably when the hook is being set by ltcmdhooks. The message happens regularly 46 times, same number in different documents, I think it is the number of ## in the definition done by cleveref.

The use of equivalent etoolbox code does not result in the same issue.

The MWE below is not really the "real use case", but is enough to reproduce the problem:

\documentclass{book}

\usepackage{cleveref}

\AddToHook{cmd/appendix/before}{\label{appendix}}

% 'etoolbox' works though
% \usepackage{etoolbox}
% \preto{\appendix}{\label{appendix}}{}{}

\begin{document}

Hello world!

\appendix

\end{document}

I'm a little at a loss at what to do with this. But understanding better what the culprit is and particularly why ltcmdhooks fails where etoolbox does not, so that I get a better grasp on what I can do with the former, would be much welcome.

Edit: I'd like to add one dimension to the problem here, if that's OK. My actual use case is that I'm trying to do some setup when the \appendix starts, in a package meant for publication. Now, I cannot hope to be sure of what \appendix may contain, since it is redefined in so many places in one or another package or class. I was expecting to keep the hook, but that, if things went sour in one or another use case, I'd be able to instruct users to manually remove the hook and restablish things to working condition. But adding and removing the hook still breaks \appendix:

\documentclass{book}

\usepackage{cleveref}

\AddToHook{cmd/appendix/before}[package-1]{\label{appendix}}
\RemoveFromHook{cmd/appendix/before}[package-1]

\begin{document}

Hello world!

\appendix

\end{document}

Would there be a safe way to disable the hook before it gets applied at begindocument?

Best Answer

There are roughly two ways to patch a command: via \scantokens, and via expansion+redefinition. There's a (not so) brief explanation of both at the end of this answer. When ltcmdhooks can detect the type of command, so that it knows exactly the <parameter text> of the command, it patches by expansion+redefinition, so it has no restriction on the catcode settings in force when the macro was defined. In the case of \appendix, it takes no arguments, so it can be treated as a token list and expanded, then redefined with the added material.

For example, here's a simple sketch of how it works:

\def\appendix{%
  \typeout{This starts the appendix.}}

\def\append#1{%
  \expandafter\appendaux\expandafter{#1}#1}
\def\appendaux#1#2#3{%
  \def#2{#1#3}}

\append\appendix{\typeout{I added this.}}

\appendix

However, what I did not anticipate when I wrote that code, is the case when the original definition of \appendix contains ## (try this definition in the code above):

\def\appendix{%
  \typeout{This starts the appendix. ##BOOM!}}

When \appendix is defined like that, TeX's definition scanner sees #6#6, and replaces that by a single parameter token #6 in the definition of \appendix, so far so good. However when you expand the command, TeX also returns a single #6, and then when you try to redefine the command you have:

\def\appendix{%
  \typeout{This starts the appendix. #BOOM!}%
  \typeout{I added this.}}

which contains an illegal parameter (#B), and the definition errors.

I have changed ltcmdhooks to handle this case (there's a brief explanation below), but meanwhile you can use \ActivateGenericHook (or \ProvideHook in LaTeX 2021-06-01) to tell ltcmdhooks that you have already patched the command, so it won't try patching, then you do the patching manually using etoolbox:

\documentclass{book}

\usepackage{cleveref}

\usepackage{etoolbox}
\IfFormatAtLeastTF{2021-11-15}%
  {\ActivateGenericHook}% LaTeX > 2021-11-15
  {\ProvideHook}%         LaTeX = 2021-06-01
    {cmd/appendix/before}
\pretocmd\appendix
  {\UseHook{cmd/appendix/before}}
  {}{\FAILED}

\AddToHook{cmd/appendix/before}{\label{appendix}}

\begin{document}

Hello world!

\appendix

\end{document}

Why the above works

The interface for ltcmdhooks in \AddToHook is supposed to work as follows:

If an end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before doesn't exist yet (which implies that the command \name doesn't have that hook "installed"), then the code tries to patch that hook in the command.

If the end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before already exists, this (probably) means that the command \name already has that hook, so it just adds the code to the hook, and leaves the command be.

This means that a package author may want to fine-tune the position of the cmd/name/before hook (for example, \def\name{<some initialization>\UseHook{cmd/name/before}<definition>}), then we don't want ltcmdhooks patching the command again (it would be wrong to add the same hook twice), so we tell ltcmdhooks that the hook already exists by saying \ActivateGenericHook{cmd/name/before}, then patching is no longer attempted.

This works for your case because you then manually add the hook to the command, and then tell ltcmdhooks that pathching is no longer needed. See section 3 Package Author Interface of the ltcmdhooks documentation.

So in essence, you, as the package author, are appropriating the \appendix command, by adding the hook yourself (exactly where ltcmdhooks would add it), and then telling ltcmdhooks to not patch it by using \ActivateGenericHook.

If instead of \appendix you were adding hooks to \UniqueCommandFromMyPackage, then you could use \NewHook instead of \ActivateGenericHook (the effect would be identical), because there would be no possibility of a name conflict.

How LaTeX2ε handles this case now

The problem: Turns out in the described case we're in a dead-end. When you write a definition like

\def\foo#1{#1##X}

TeX stores its <replacement text> as a token list containing:

out_param 1, par_token #, letter X

(out_param 1 is #1 to be replaced by the actual parameter when the macro is expanded, par_token # is a catcode 6 #, and letter X is a catcode 11 X).

Then, when you expand \foo with #1 (par_token #, character 1), TeX replaces out_param 1 and you have:

par_token #, character 1, par_token #, letter X

which is equivalent to typing #1#X. If you plug that back into a new definition of \foo you'll have:

\def\foo#1{#1#X}

which is obviously wrong (and thus the Illegal parameter number error). And at this point you have no way to tell what was an actual parameter when the macro was defined, and what was a single parameter token.

Half solution: There is one very simple case that can be easily detected and solved (which coincidentally is the one in your question): a macro without parameters. In this case, the macro has no argument, so any loose ## in its definition cannot possibly be confused with a parameter, so we can treat this such macros as token lists (in the expl3 sense) and do something akin to \tl_put_right:Nn and problem solved.

Another relatively simpler case is when the macro has no ## in its definition. In this case we don't have to worry about confusing parameters, so we treat the macro normally (this was the case implemented initially). LaTeX uses a rather simple loop to check if a macro has a parameter token in its definition (\__hook_if_has_hash:nTF): it looks at every token in the defintion, and compares it with #.

The other half: When the macro falls into the general case of having both parameters and parameter tokens in its definition (like \foo above), then we have to manually re-double every parameter token in the definition, so that it can be re-made. To do that, instead of expanding \foo with #1, LaTeX expands it with \c_@@_hash_tl, so \foo{\c_@@_hash_tl} becomes a definition like:

\foo#1{\c_@@_hash_tl 1#X}

then we loop through the replacement text of the macro (inside the braces) and double every ##, and replace every \c_@@_hash_tl by a single #, which then gives:

\foo#1{#1##X}

and then we can do the definition normally (phew!)


Patching with \scantokens

(wordier description here)

Suppose a macro defined with

\long\def\mycmd[#1]#2{\typeout{#1//#2}}

To append some code to it via \scantokens, you first do \meaning\mycmd to get a string like:

\long macro:[#1]#2->\typeout {#1//#2}

(with usual \detokenize catcodes: all 12 except spaces, which are catcode 10), then you use a delimited macro to separate the <prefixes>, the <parameter text>, and the <replacement text>, roughly like this:

\def\split#1{\expandafter\splitaux\meaning#1\relax}
\expanded{%
  \noexpand\def\noexpand\splitaux#1\detokenize{macro:}#2->#3\relax}{%
    \def\prefixes{#1}%
    \def\parameter{#2}%
    \def\replacement{#3}}

(I'm using \def\prefixes{#1}, etc. for the sake of understandability, but in reality you would inject everything expandably instead; see the definition of \__kernel_prefix_arg_replacement:wN in expl3-code.tex, and \etb@patchcmd in etoolbox.sty if you're feeling brave).

At this point you have every part of the definition as a string separately. Now you can either append or prepend some code to \replacement (or replace some part of it, as it's done in \patchcmd), or in rarer cases change \prefixes or \parameter. At this point you have three strings, each of which is a part of the definition. To reconstruct the definition you need:

<prefixes>\def\mycmd<parameter text>{<replacement text>}

but the three parts you have are still catcode 12 tokens, which are no good. Here comes the \scantokens part: you rescan those strings back to "normal" tokens:

\expanded{%
  \noexpand\scantokens{%
  % <prefixes>\def         \mycmd<parameter text>{<replacement text>}
    \prefixes \def\noexpand\mycmd\parameter      {\replacement      <added material>}%
  }%
}

which, after \expanded does its job, becomes:

\scantokens{%
  \long\def\mycmd[#1]#2{\typeout {#1//#2}<added material>}%
}%

then \scantokens does its thing and turns everything into tokens using the current catcode settings, and then the definition is carried out normally.

The advantage of this method is that you can do virtually any manipulation in any part of the definition.

The disatvantages are a few:

  • You need to know what catcodes were in force when the definition was first made (when patching you usually need to verify that a simple round of \meaning\scantokens doesn't change the meaning of the macro) otherwise you can't patch safely;
  • If the macro was created with some combination of \edef and \detokenize to forcibly make some catcode 12 tokens, you will probably not be able to patch that macro (for example, \splitaux as defined above in this answer cannot ever be patched with \patchcmd because it contains letters (for example m) of both catcodes 11 and 12);
  • If the <parameter text> of the macro contains the characters ->, you won't be able to patch the macro.

Patching with expansion+redefinition

This method is much simpler, but requires previous knowledge of how the macro was defined. This can be done in few cases, namely when you know exactly what the <parameter text> of the macro is. The cases known by the kernel are when the macro was defined with \DeclareRobustCommand, or with ltcmd (\NewDocumentCommand or \NewExpandableDocumentCommand), or with \newcommand with an optional argument, or when the macro takes no argument.

Suppose the same macro from before, but defined with:

\newcommand\mycmd[2][default]{\typeout{#1//#2}}

(it will have an internal macro called \\mycmd, but for the sake of simplicity let's call it \mycmd as well), then we know for sure its <parameter text> is [#1]#2. Knowing what arguments the macro expects, we can feed it #1, #2, ... as arguments, so for \mycmd we would do:

\mycmd[#1]{#2}

which would then expand to the <replacement text> of the macro, with the first parameter (#1) replaced by #6112 (the parameter token # followed by the character 1). The patching scheme would be something like:

\expanded{%
  \def\noexpand\mycmd[#1]#2{%
    \unexpanded\expandafter{\mycmd[#1]{#2}<added material>}%
  }%
}

then after the \expanded is done you are left with:

\def\mycmd[#1]#2{\typeout{#1//#2}<added material>}}

which is exactly what you had with the \scantokens approach, except that you didn't turn tokens into a string, so catcodes don't matter at all here.

The advantages of this method are roughly the disadvantages of the \scantokens method:

  • catcodes don't matter at all;
  • you can patch complicated macros (including the \splitaux macro from before) using this method given you know exactly what its <parameter text> is;
  • the <parameter text> of the macro may contain any token your heart desires (as long as you know what token it is); and
  • this method doesn't need a sanity check to ensure that the macro can be patched correctly.

The disadvantage is the requirement for the method to work: you need to know exactly what the <parameter text> is.

Related Question