[Tex/LaTex] Trimming whitespace around text (like LTRIM, RTRIM and TRIM)

macrosspacing

When working in Excel with text strings, it is convenient to use the LTRIM, RTRIM and TRIM functions which removes white space around text string. What would be an efficient way of duplicating this in LaTeX?

For example, say you programmatically produce variables

\def\firstname{FirstName}% First name
\def\lastname{LastName}% Last name
\edef\fullname{\firstname\ \lastname}% Full name
\fullname% Display full name

yet should also accommodate for when \firstname or \lastname may be empty. Without testing whether they are empty, something like

\def\firstname{FirstName}% First name
\def\lastname{}% Last name (none)
\edef\fullname{\firstname\ \lastname}% Full name
\trim{\fullname}% Display full name

would take care of this. My first thought was to define

\def\trim#1{\ignorespaces#1\unskip}

but this would most certainly not work in a general setting, since this does not take care of an empty group. Moreover, \unskip would only take care of the last skip, of which there may be more than one.

In particular, is it possible to define \trim such that it will take care of

  • \hspaces and \hskips?
  • \ 's?
  • empty groups {} and perhaps non-printing tokens like \relax?

enter image description here

\documentclass{article}
\begin{document}
Without \verb|\trim|:\par\medskip
\def\firstname{FirstName}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\def\firstname{FirstName}\def\lastname{}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\def\firstname{}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\fullname}

\bigskip

With \verb|\trim|:\par\medskip
\def\trim#1{\ignorespaces#1\unskip}
\def\firstname{FirstName}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\def\firstname{FirstName}\def\lastname{}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\def\firstname{}\def\lastname{LastName}
\edef\fullname{\firstname\ \lastname}\fbox{\trim{\fullname}}

\end{document}

Best Answer

Trimming all of the explicit spaces around input is certainly doable. There are a number of approaches about this problem: I would go with the one Bruno Le Floch wrote for expl3 as \tl_trim_spaces:n. That can be used by doing

\usepackage{expl3}
\ExplSyntaxOn
\cs_new_eq:NN \trimspaces \tl_trim_spaces:n
\ExplSyntaxOff

Alternatively, the implementation can be included directly in the source and thus avoid any dependency:

\documentclass{article}
\makeatletter
\long\def\trim@spaces#1{%
  \@@trim@spaces{\q@mark#1}%
}
\def\@tempa#1{%
  \long\def\@@trim@spaces##1{%
    \@@trim@spaces@i##1\q@nil\q@mark#1{}\q@mark
      \@@trim@spaces@ii
      \@@trim@spaces@iii
      #1\q@nil
      \@@trim@spaces@iv
      \q@stop
  }%
  \long\def\@@trim@spaces@i##1\q@mark#1##2\q@mark##3{%
    ##3%
    \@@trim@spaces@i
    \q@mark
    ##2%
    \q@mark#1{##1}%
  }%
  \long\def\@@trim@spaces@ii\@@trim@spaces@i\q@mark\q@mark##1{%
    \@@trim@spaces@iii
    ##1%
  }%
  \long\def\@@trim@spaces@iii##1#1\q@nil##2{%
    ##2%
    ##1\q@nil
    \@@trim@spaces@iii
  }%
  \long\def\@@trim@spaces@iv##1\q@nil##2\q@stop{%
    \unexpanded\expandafter{\@gobble##1}%
  }%  
}
\@tempa{ }
\def\test{ foo }
\edef\test{\expandafter\trim@spaces\expandafter{\test}}
\show\test

This will remove all of the spaces from the ends of the input, even is you do something tricky like \edef\test{ \space foo \space} to start with (so there are multiple spaces at both ends). (If you are happy to limit yourself to this case, then xparse offers the \TrimSpaces post-processor for arguments using this method.)

The way the above works is that there are two loops: one for spaces at the start of the input (\@@trim@spaces@i), a second for those as the end (\@@trim@spaces@iii). First, \@@trim@spaces sets things up such that the correct markers are in place. In the 'leading' step, \@@trim@spaces@i matches an argument consisting of \q@mark followed by a space (the space itself is discarded). If there are more spaces then #1 and #3 will be empty and #2 will be the remaining input, meaning that \@@trim@spaces@i will be called again with the remaining input. On the other hand, if there are no spaces left in the input then #2 matches the empty input set up by \@@trim@spaces, #1 is the user input with all leading spaces removed and #3 is \@@trim@spaces@ii. The latter stops the loop and hands off to \@@trim@spaces@iii (a \q@mark is left on the front of the user input to prevent any loss of braces: see later). In this second loop, and spaces at the end of the input will appear just before \q@nil. This pattern is matched by the argument to \@@trim@spaces@iii. If there was a trailing space in the input then #1 is the user input with the space removed (but still with a leading \q@mark) and #2 is \@@trim@spaces@iii, leading to a loop. However, when the trailing spaces are exhausted, #2 is \@@trim@spaces@iv and #1 is the \q@mark <user input>\q@nil\@@trim@spaces@iii. The \q@nil\@@trim@spaces@iii is removed by the argument patter for \@@trim@spaces@iv before the leading \q@amrk is stripped off by \@gobble (with the \unexpanded preventing further expansion).

Note that the above uses e-TeX to allow it to prevent further expansion inside an \edef or similar. If the extensions are not available, change the last auxiliary to

  \long\def\@@trim@spaces@iv##1\q@nil##2\q@stop{%
    \@gobble##1%
  }%

with the proviso that this will mean that you do have to be cautious what is passed through.

A second thing to note is that there are some 'special' tokens in the above, for example \q@nil, that are used to match the macro argument patterns and so can't be in the input. That really should be okay with 'text', but you could use something even more obscure like \catcode`\Q=3 then Q (math shift catcode) if you wanted to.

Removing the other items requested would mean searching for all of them separately. That sounds quite tricky in the case of \hspace/\hskip as presumably the spacing could be given in any valid units, even before we worry about things like

\def\foo{10 pt }
\hskip\foo

As you may know, dealing with group tokens is tricky at the best of times, so finding an empty group could also be hard. (I guess you'd need to use a loop: grab each token in the input, see if it's empty and if it's not add it to the 'keep' pile.)

Moreover, I think that this sort of input is pretty unlikely in real input. Trimming explicit spaces make sense, but I am not convinced about the other items (unless there is some particular case here where there is a good chance of picking up the other items).