[Tex/LaTex] How to automatically generate BibTeX data

bibtex

I have a folder on my computer and it contains a (high) number of math papers of many different authors. Some of them are published, some of them are on the arXiv, some are never going to be published or uploaded to the arXiv.

My question is the following: is there a script or some kind of automated way for producing a big BibTeX file containing the bibliographic data of all the papers in that folder?

Some observations:

  1. Such a script should basically look for the data on arXiv, Mathscinet, Google scholar mainly. It's clear that not all the entries in my folder will admit a complete bibliographic information (e.g. the papers not intended for publication) so the script would probably return some errors or some partial information. Everything should be, in any case, checked by a human being.

  2. It is possible that some software like Bibdesk, Jabref, Zotero or Mendeley offer some service like this. I tried to look for this kind of feature but I haven't found anything explicitly about it. I actually have Bibdesk and I like it a lot, so a solution involving Bibdesk would be really appreciated.

  3. It would be nice if the script had a feature to produce citation keys in a standard way. For instance the paper "Blabla and blable in Mathematics by Auth1 and Auth2" would have a citation key "auth1_auth2:blabla_and_blable_in_mathematics".

  4. A partial solution seems to be that suggested here: http://www.math.tamu.edu/~comech/tools/bibget/. I haven't checked if this works well; if it does one could combine this script with a similar one that would search the arXiv.

Thank you in advance

Best Answer

Disclaimer: This solution is not perfect, but might be a good start. The parsing of PDFs was written by me because, I had a huge set of PDFs not available in my BibTeX database. There might be other alternatives such as Papers and Mendeley which are said to have a good PDF parsing and BibTeX export. I am one of the authors of JabRef and like open source development.

JabRef is an MIT-licensed open-source BibTeX and BibLaTeX bibliographic manager actively developed on GitHub. It offers the functionality to import bibliographic data from PDFs.

Adjust the JabRef key generation pattern to fit your needs

JabRef offers a BibTeX key generation and offers different patterns described at https://help.jabref.org/en/BibtexKeyPatterns. In your case, the closest match is [authors]:[camel].

  1. Open the preferences Options Preferences

  2. Navigate to "BibTeX key generator" BibTeX key generator preferences

  3. Change the default pattern to [authors]:[camel]. authors camel

  4. Click "OK"

Link the PDFs to your bib file

  1. Create or open a .bib file.

  2. Go to "Quality" -> "Find unlinked files". menu entry for starting find linked files

  3. The "Find unlinked files" dialog opens. Find unlinked files dialog

  4. Choose a directory using the "Browse" button.

  5. Click on "Scan directory".

  6. In "Select files", the files not yet contained in the database are shown. scan result

  7. To create entries for all files, click on "Apply".

  8. For each file, an import dialog is shown metadata dialog

    The dialog shows the XMP metadata stored in the PDF in the area "XMP-metadata". If this data fits your needs, select "Create entry based on XMP data". Typically, the XMP-metadata is not good enough. Choose "Create entry based on content".

  9. Click on "OK" to start the import

  10. A dialog asking for the link is opened Link to file dialog You can choose "Leave file in its current directory" to keep the file where it is. Typically, this is that what one wants. In case you choose "Move file to file directory", you can also choose to rename the file to the generated BibTeX key.

  11. Press OK to link the file to the BibTeX entry

  12. This happens for each file. After that, the "Find unlinked files" dialog is shown. Just click on "Close" to close it.

  13. The entry editor with the last imported entry is shown entry editor

  14. You can now save the file and are finished.

  15. Optional: Click on "General" to see the linked file General tab

  16. Optional: Click on "BibTeX source" to see the BibTeX source BibTeX source

  17. Optional: You have to shrink it to see the entry in the entry table Enlarge the JabRef window and use the mouse at the upper border of the entry editor Shrunk entry editor

  18. Optional: Press Esc to show the entry preview Entry preview

Further information

PDFs for which it works

The importer based on the content has been written for IEEE and LNCS formatted papers. Other formats are not (yet) supported. In case a DOI is found on the first page, the DOI is used to generate the BibTeX information.

The next development step is to extract the title of the PDF, use the "Lookup DOI" and then the Get BibTeX data from DOI functionality from JabRef to fetch the BibTeX data.

We are also thinking about replacing the code completely by using another library. This is much effort and there is no timeline for that.

Better filenames

JabRef also offers to change the filenames. You can adapt the pattern at Preferences -> Import Preferences -> Import

Select "Choose pattern" and choose "bibtexkey - title" enter image description here This results in the setting ´\bibtexkey\begin{title} - \format[RemoveBrackets]{\title}\end{title}`.

This makes the filenames start with the bibtey key followed by the full title. In the concrete case, \bibtexkey only may be the better option as the described bibtey key already contains the title.

Tested JabRef versions

One has to use a recent version of JabRef. With JabRef 3.6, this feature did not work: https://github.com/JabRef/jabref/issues/2214

Mr.DLib

JabRef used to have support for Mr.DLib, which returned back a full BibTeX entry or a PDF. Due to unclear copyright situation of a used library, this service was removed. Further, Mr.DLib changes its focus and will provide literature recommendations. See https://github.com/JabRef/jabref/pull/2189.

Related Questions: