[Tex/LaTex] Macro to extract (typeset) ‘plain text’ from a command

bookmarkshyperrefmacrosstringstext manipulation

Sorry to come back again with a similar post title (I had a post with this same title, then renamed it to PDF Metadata – Macro to extract (typeset) 'plain text' from a command?) – but I hope now the question can be narrowed down a bit…

Let's say you have something like

\def\mystring{Test of math $a^{b}$, \textbf{bold} and some  {\color{lightgray}coloring}}

I would like to use/know of a macro, which could accept \mystring, and return a "plain" text version of it. I would be quite satisfied with the math being dropped – however, I'd like the letters pulled out, where I'd otherwise have text typeset. Let's say we call such a macro \getPlainText – then I'd be very satisfied with something that would behave like this:

* \typeout{ \getPlainText{\mystring} }
Test of math  , bold and some coloring

Basically, I think I could use this in a couple of ways (but mostly in response to hyperref stuff):

Within PDF metadata

In hyperref – PDF Metadata – Macro to extract (typeset) 'plain text' from a command?, the answer to correct display of "latexed" PDF metadata is to use \texorpdfstring as such:

\newcommand{\myauthors}{Author 1 \\ Author 2 \\ \texorpdfstring{\color{lightgray}Author 3}{Author 3}}
...
\hypersetup{..., pdfauthor=\myauthors, ...} 

Here, I'd much rather prefer a syntax with "getPlainText" as so:

\newcommand{\myauthors}{Author 1 \\ Author 2 \\ {\color{lightgray}Author 3}}
...
\hypersetup{..., pdfauthor=\texorpdfstring{\myauthors}{\getPlainText{\myauthors}}, ...} 

Within PDF TOC bookmarks related to sections titles with math

In mathmode – Equations in section heading/title, the answer to avoiding problems with math in sections (which hyperref chokes on) is again to use \texorpdfstring – as such:

\section{The values of \texorpdfstring{$\beta$}{TEXT} %
 for which values are defined}

Here I'd much rather use something like this in the preamble:

\makeatletter
\let\oldsection\section
\renewcommand{\section}{\@ifstar
                     \mysectionStar%
                     \mysectionNoStar%
}
\newcommand{\mysectionStar}[1]{  % no two arguments here?
\typeout{AAAA}% debug
\oldsection{#1}}
\newcommand{\mysectionNoStar}[1]{ %
\typeout{BBBB}% debug
\oldsection[\getPlainText{#1}]{#1} }
\makeatother

… and avoid having to manually add \texorpdfstrings to all the sections..

So, is there anything like this out there?

Best Answer

I don't understand completely what you're getting at here - why would you want plain text versions of your section titles in the table of contents?

And the bookmarks which are automatically generated by hyperref are already plain text, aren't they?

Do you have a concrete example where bookmark generation goes wrong? Then one should maybe look specifically at this!

That said, the macro \pdfstring which generates the bookmark text is absolutely the nearest you will get to your \getPlainText from within TeX.

Let's experiment with your example:

\documentclass{article}

\usepackage{color}
\usepackage{hyperref}

\def\mystring{Test of math $a^{b}$, \textbf{bold} and some  {\color{lightgray}coloring}}

\pdfstringdef\myplainstring{\mystring}

\typeout{"\myplainstring"}

This produces the output

Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `math shift' on input line 11.


Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `superscript' on input line 11.


Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `math shift' on input line 11.


Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `\@ifnextchar' on input line 11.

"Test of math ab, bold and some lightgraycoloring"

You see that it deals reasonably well with math, but stumbles a bit over \color because color is looking for an optional argument with \@ifnextchar which is not expandable. This is because \color is not explicitly handeled by \pdfstringdef.

You can however add an explicit handler like this:

\pdfstringdefDisableCommands{\def\color#1{}}

You can call this several times to "declare" other special handlers for other commands, or you can declare several things at once. However, every definition stored by \pdfstringdefDisableCommands is executed just before \pdfstringdef, so now we get

Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `math shift' on input line 11.


Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `superscript' on input line 11.


Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):
(hyperref)                removing `math shift' on input line 11.

"Test of math ab, bold and some coloring"

As \color has an expandable definition now, it can be cleanly converted.

On the whole, this still means some manual work to 'fix' all the commands you're using in section titles, but at least you don't need to clutter the sections themselves with \texorpdfstring.

I don't think a more general solution is possible at all from within TeX, as it's nearly impossible to write a meta interpreter in TeX.