[Tex/LaTex] Correct arXiv hyperlinks with BibLaTeX and Mendeley

biblatexhyperref

I use Mendeley to generate the bibliography.bib file and biblatex to process it. I noticed a problem with arXiv references for which the field primaryClass is defined. Consider the following example:

\documentclass[12pt,a4paper]{article}
\usepackage[style=numeric,backend=biber]{biblatex}

\begin{filecontents}{bibliography.bib}
@misc{Witten1994,
archivePrefix = {arXiv},
arxivId = {hep-th/9411102},
author = {Witten, Edward},
eprint = {9411102},
month = {nov},
primaryClass = {hep-th},
title = {Monopoles and Four-Manifolds},
year = {1994}
}
\end{filecontents}

\addbibresource{bibliography.bib}

\usepackage{hyperref}
\begin{document}

\cite{Witten1994}

\printbibliography

\end{document}

The bibliogaphy entry was copied from the file generated by Mendeley. The code produces the following:

Example

However, when I click on the link, it takes me to https://arxiv.org/abs/9411102,
which does not exist. The correct hyperlink should be https://arxiv.org/abs/hep-th/9411102. My questions are:

  1. How can I correct the hyperlink?

  2. How can I get arXiv:hep-th/9411102, which is the correct arxivId, instead of arXiv: 9411102 [hep-th]?

Only solutions which do not require me to modify the bibliography file are acceptable.

Thank you very much for your help!

Update 1: This is the non-modified output generated by Mendeley.

@misc{Witten1994,
abstract = {Recent developments in the understanding of {\$}N=2{\$} supersymmetric Yang-Mills theory in four dimensions suggest a new point of view about Donaldson theory of four manifolds: instead of defining four-manifold invariants by counting {\$}SU(2){\$} instantons, one can define equivalent four-manifold invariants by counting solutions of a non-linear equation with an abelian gauge group. This is a ``dual'' equation in which the gauge group is the dual of the maximal torus of {\$}SU(2){\$}. The new viewpoint suggests many new results about the Donaldson invariants.},
archivePrefix = {arXiv},
arxivId = {hep-th/9411102},
author = {Witten, Edward},
eprint = {9411102},
file = {:home/ylpu/Dropbox/Knihovna/files/Witten - 1994 - Monopoles and Four-Manifolds.pdf:pdf},
month = {nov},
primaryClass = {hep-th},
title = {{Monopoles and Four-Manifolds}},
url = {https://arxiv.org/pdf/hep-th/9411102.pdf http://arxiv.org/abs/hep-th/9411102},
year = {1994}
}

Update 2: I changed article to misc in the code above because the image was generated using misc and not article.

Discussion about @article vs. @online/@misc:

The options in Mendeley are either Generic or Journal Article; they produce @misc and @article, respectively. As far as I know there is no option for @online. I include below another example, where [1] is a citation of a journal article, and [2] and [3] are citations of the same article from arXiv using @misc and @article, respectively. I personally prefer @misc for arXiv articles because I do not see any reason for "In: (Nov. 1994)". On the other hand, I like "In:" for journal articles because it marks the beginning of the publication data – journal, volume, date, page.

@article vs @misc

Best Answer

The correct way to give pre-2007 arXiv identifiers in biblatex is putting eprinttype = {arxiv} and eprint = {<class>/<id>}. So ideally your example would be

@online{Witten1994,
  author     = {Witten, Edward},
  title      = {Monopoles and Four-Manifolds},
  date       = {1994-11},
  eprint     = {hep-th/9411102},
  eprinttype = {arxiv},
}

Note how I used the actually existing entry type @online instead of the non-existent @generic.

Papers that use the new, post-2007 scheme have the class in eprintclass and only the numerical identifier in the eprint field

@online{wassenberg,
  author       = {Wassenberg, Jan and Sanders, Peter},
  title        = {Faster Radix Sort via Virtual Memory and Write-Combining},
  date         = {2010-08-17},
  version      = 1,
  eprinttype   = {arxiv},
  eprintclass  = {cs.DS},
  eprint       = {1008.2849v1},
}

This is consistent with the link always going to https://arxiv.org/abs/<eprint>.

For backwards compatibility reasons archivePrefix is an alias for eprinttype, so archivePrefix = {arxiv} is the same as eprinttype = {arxiv}. Furthermore primaryclass is an alias for eprintclass.

This explains what you are seeing: arxivId is not a know field and therefore ignored, archivePrefix becomes eprinttype, primaryClass becomes eprintclass and so you get the output that

  eprinttype   = {arxiv},
  eprintclass  = {hep-th},
  eprint       = {9411102},

would produce:

arXiv: 9411102 [hep-th]

with a link to http://arxiv.org/abs/9411102


Your .bib entries seems to be using an unintelligible mess of old and new formats together with fields unknown to biblatex. This should definitely be fixed in the .bib file. If Mendeley produces this mess automatically, complain to the Mendeley developers. If some automatic citation export tool produced this mess (Software-generated bibliographic entries: common errors and other mistakes to check before use), complain to the provider of the data. (At least if they claim to offer biblatex-compatible data. Otherwise they can hide behind the fact that there is no consensus amongst traditional BibTeX .bst files how to handle arXiv eprints - if they offer it at all.)

You can try and get Biber to repair some of the damage on the fly. The following works under the assumption that all old-style arXiv identifier are presented as in your question. In particular arxivId must contain the full link after https://arxiv.org/abs/ in the form <class>/<identifier> and archivePrefix or eprinttype must be arxiv. The primaryClass will be discarded if a / is found in arxivId indicating we have a pre-2007 identifier.

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[overwrite=true]{
      \step[fieldsource=arxivId]
      \step[fieldset=eprint, origfieldval]
      \step[fieldsource=eprint, match={/}, final]
      \step[fieldset=primaryClass, null]
    }
  }
}

I didn't do anything for new identifiers since I don't know how these look in your .bib file.