This is essentially the same as @gernot's answer but reduces the number of \expandafter
s and \csname
s making the code more readable, imho.
The approach uses two steps of processing, the first step just reads in the first argument (the others are curried) and creates the macro names from it, resulting in two new arguments. The next step then gets all the arguments.
\documentclass[]{article}
\usepackage{tikz}
\makeatletter
\newcommand\newABC[1]
{%
% #1: name for macro and box
% #2: before (curried)
% #3: raise (curried)
% #4: after (curried)
% #5: contents (curried)
\expandafter\newABC@\csname #1\expandafter\endcsname\csname #1box\endcsname
}
\newcommand\newABC@[6]
{%
% #1: macro
% #2: box-macro
% #3: before
% #4: raise
% #5: after
% #6: contents
\newsavebox#2%
\sbox#2{\mathalpha{\hspace{#3pt}\raisebox{#4pt}{#6}\hspace{#5pt}}}%
\newcommand#1{\usebox#2}%
}
\makeatother
\newABC{ehh}{1}{1}{.5}
{%
\begin{tikzpicture}
\node at (0,0){\(h\)};
\end{tikzpicture}%
}
\begin{document}
A\ehh b
\end{document}
As requested a few explanations on the code:
\csname
expands all following tokens until it finds an \endcsname
and the result is turned into a control sequence.
\expandafter
steps over the next token (regardless which kind of token, an opening brace for instance is a token as well which could be stepped over with this) and expands the token after that one once (if that token isn't expandable nothing happens).
So \expandafter\stuff\csname foo\endcsname
will result in \stuff
being stepped over and \csname
being expanded once. Within a single step of expansion \csname
expands all following tokens until it finds an \endcsname
and leaves everything in between as the name of a control sequence. In this case it'll find foo
(letters don't expand further), and so after \csname
is done the \expandafter
will be removed from the input stream and \stuff
put back, so the input will now contain \stuff\foo
.
We can utilize the fact that \csname
expands everything until it finds an \endcsname
to build two control sequences at once (in the following the next thing TeX will evaluate will be preceded by |>
and the tokens stored to be put back because TeX stepped over them will be preceded by ||
-- this is the same style the unravel
package would use, though my steps might not be the same that package would show):
|> \expandafter\stuff\csname foo\expandafter\endcsname\csname bar\endcsname
will first step over \stuff
, so the input will look like this (this is less than one step of expansion, more a step of processing):
|| \expandafter\stuff
|> \csname foo\expandafter\endcsname\csname bar\endcsname
Now \csname
will start grabbing/expanding tokens, and because of \expandafter
the \endcsname
will not be found, instead TeX steps over it and expands what follows:
|| \expandafter\stuff
|| \csname foo\expandafter\endcsname
|> \csname bar\endcsname
Now the second \csname
grabs/expands tokens until it finds \endcsname
and turn the found string into a control sequence:
|| \expandafter\stuff
|| \csname foo\expandafter\endcsname
|| \csname bar
|> \endcsname
and
|| \expandafter\stuff
|| \csname foo\expandafter\endcsname
|| \bar
Now the second \csname
is done with one step of expansion and the second \expandafter
will be removed and the token which followed it put back, so the next step of processing would look like
|| \expandafter\stuff
|| \csname foo
|> \endcsname\bar
The first \csname
finally finds its \endcsname
and this will become
|| \expandafter\stuff
|| \foo\bar
Now also the first \csname
had its step of expansion from \expandafter
, so it'll be removed and \stuff
put back, so this eventually becomes
|> \stuff\foo\bar
and now \stuff
can do stuff.
Even though the above was visualized in many small steps of processing when we look at expansion steps this is all done in a single step, because \expandafter
will in a single step expand the \csname
and that will fully expand the remaining stuff in this one step.
Best Answer
Each byte of the UTF-8 encoding is a separate token in pdflatex, however you can recognise the leading token which tells you how many bytes are needed. This version covers the one and two byte cases.
If you want to handle the rest of the input as opposed to discarding everything after the first letter, you can make a small change so that you pass in a command to appy to the remaning text. If you pass in
\gobble
it extracts as before. If you pass in\firstofx\gobble
then it exctracts the first letter of the remaining text so you get two letters: