[Tex/LaTex] Parsing Underscores in URLs from Mendeley

biberbiblatexbibliographiestexliveurls

Introduction

Mendeley's desktop software has the ability to export the references to a bibtex compatible file for use in TeX based documents.

However when exporting, Mendeley escapes all special characters, in particular, underscore characters. For the most part this is fine, but this is also done in the 'url' field of the bibliography entry and can cause output errors. Conversely, this escaping step can be disabled but causes many more problems than it solves.

The Problem

The desired result is to have hyperlink equivalent to:

http://www.example.com/content/news/sep_09/12.html

However, the bibliography uses a url value of

http://www.example.com/content/news/sep{\_}09/12.html

which is retrieved from the external .bib file. This is then escaped by \url (which escapes all HTML special characters) such that the link is now:

http://www.example.com/content/news/sep%7B%5C_%7D09/12.html

What I have tried

I have considered using Biber to preparse the file as demonstrated here by npdoty. I have included what I think is the equivalent of this below as the MWE adapted from another relevant substitution from moewe's answer here.

Regex in use: {\\_}

Have tried: {\\\_}, \{\\\_\}

The potential answers

  • Is my problem as simple as having an incorrect regex?
  • Is biber doing some parsing when reading the file in and thus I am no longer matching {\_}?.
  • Is there a biber/biblatex switch that allows URLs to be latex-escaped rather than verbatim?

MWE

\RequirePackage{filecontents}
\begin{filecontents*}{\jobname.bib}

@misc{enc2015,
    author = {{Example News Company}},
    title = {{Daily News for September 9 2015}},
    url = {http://www.example.com/content/news/sep{\_}09/12.html},
    urldate = {2016-03-08},
    year = {2015}
}
\end{filecontents*}

\documentclass{article}

% ---- Bibliography Settings ----
\usepackage{csquotes} % Babel bibliography support package
\usepackage[english]{babel}
\usepackage[backend=biber,style=numeric]{biblatex}
\usepackage{hyperref}

\addbibresource{\jobname.bib}

\DeclareSourcemap{
  \maps{
    \map{
      \step[fieldsource=url,
            match=\regexp{\{\textbackslash\textbackslash\textunderscore\}},
            replace=\regexp{\textunderscore}]
    }
  }
}

\begin{document}
\nocite{enc2015}
\printbibliography
\end{document}

Footnotes

If Mendeley find this, a relevant bug report/suggestion is Mendeley Forums: Suggestion #2088193.

Best Answer

After further investigation and head scratching I have reached a working solution.

Updated MWE

\RequirePackage{filecontents}
\begin{filecontents*}{\jobname.bib}

@misc{enc2015,
    author = {{Example News Company}},
    title = {{Daily News for September 9 2015}},
    url = {http://www.example.com/content/news/sep{\_}09/12},
    urldate = {2016-03-08},
    year = {2015}
}
\end{filecontents*}

\documentclass{article}

% ---- Bibliography Settings ----
\usepackage{csquotes} % Babel bibliography support package
\usepackage[english]{babel}
\usepackage[backend=biber,style=numeric]{biblatex}
\usepackage{hyperref}

\addbibresource{\jobname.bib}

\addbibresource{library.bib}
\DeclareSourcemap{ % Used when .bib/Bibliography is compiled, not when document is
    \maps[overwrite, datatype=bibtex]{
        \map{ % Replaces '{\_}', '{_}' or '\_' with just '_'
            \step[fieldsource=url,
                  match=\regexp{\{\\\_\}|\{\_\}|\\\_},
                  replace=\regexp{\_}]
        }
        \map{ % Replaces '{'$\sim$'}', '$\sim$' or '{~}' with just '~'
            \step[fieldsource=url,
                  match=\regexp{\{\$\\sim\$\}|\{\~\}|\$\\sim\$},
                  replace=\regexp{\~}]
        }
        \map{ % Replaces '{\$}'
            \step[fieldsource=url,
                  match=\regexp{\{\\\x{26}\}},
                  replace=\regexp{\x{26}}]
        }
    }
}

\begin{document}
\nocite{enc2015}
\printbibliography
\end{document}

Reasoning

So in comparison to what is written above in the question, this version uses direct escaping of the characters. i.e. \\ and \_ instead of \textbackslash and \textunderscore. I originally tried this but it didn't work, now I must have done something slightly differently.

A theory I have is that this is because \t is converted to the TAB character by \regexp resulting in searches for <tab>extbackslash and <tab>extunderscore instead of the \ and _ characters.

Summary

For those who are having trouble with escaped characters in URLs with Mendeley, by using Biber & BibLaTeX, the following snippet can be used to unescape those pesky _, ~ and & characters.

In the version of Mendeley I am using (v1.16.1) these characters are escaped as {\_} and {~}. However I have included support for older \_ and $\sim$ escape sequences reported in other questions and around the internet while searching for a solution.

% This snippet must be in the preamble.
\usepackage[english]{babel} % untested with other languages
\usepackage[backend=biber]{biblatex}
\usepackage{hyperref} % for clickable urls

\addbibresource{library.bib} % Mendeley BibTeX library

\DeclareSourcemap{
    \maps[overwrite, datatype=bibtex]{
        \map{ % Replaces '{\_}', '{_}' or '\_' with just '_'
            \step[fieldsource=url,
                  match=\regexp{\{\\\_\}|\{\_\}|\\\_},
                  replace=\regexp{\_}]
        }
        \map{ % Replaces '{'$\sim$'}', '$\sim$' or '{~}' with just '~'
            \step[fieldsource=url,
                  match=\regexp{\{\$\\sim\$\}|\{\~\}|\$\\sim\$},
                  replace=\regexp{\~}]
        }
        \map{ % Replaces '{\$}'
            \step[fieldsource=url,
                  match=\regexp{\{\\\x{26}\}},
                  replace=\regexp{\x{26}}]
        }
    }
}

Hope this helps someone else dealing with this problem.

In recent versions of biber/biblatex, this can be extended to include other fields likely to contain problem characters, such as doi, using the foreach specifier:

\map[overwrite, foreach={url,doi}]{ % Replaces '{\_}', '{_}' or '\_' with just '_'
    \step[fieldsource=\regexp{$MAPLOOP},
          match=\regexp{\{\\\_\}|\{\_\}|\\\_},
          replace=\regexp{\_}]
}