[Tex/LaTex] Statistics about your bibtex database

bibliographiesbibtex

Does somebody know how can you do some data mining of your Bibtex database? For example, some statistics about the number of papers per journal, author or year; stuff like that. Is there a tool or a website available?

Best Answer

I don't know of such a tool, but it should be possible to process BibTeX data in statistical software (such as R) or even in Excel.

The bibtex package on CRAN can parse BibTeX files. Converting to a data frame should be possible on an attribute-by-attribute basis. Then you can analyze whatever you want. See also this related question on SO.

> rref <- read.bib()
> rref
R Development Core Team (2009). _R: A Language and Environment for
Statistical Computing_. R Foundation for Statistical Computing, Vienna,
Austria. ISBN 3-900051-07-0, <URL: http://www.R-project.org>.
> str(rref)
List of 1
 $ :Class 'bibentry'  hidden list of 1
  ..$ :List of 7
  .. ..$ title       : chr "R: A Language and Environment for Statistical Computing"
  .. ..$ author      :Class 'person'  hidden list of 1
  .. .. ..$ :List of 5
  .. .. .. ..$ given  : chr "R Development Core Team"
  .. .. .. ..$ family : NULL
  .. .. .. ..$ role   : NULL
  .. .. .. ..$ email  : NULL
  .. .. .. ..$ comment: NULL
  .. ..$ organization: chr "R Foundation for Statistical Computing"
  .. ..$ address     : chr "Vienna, Austria"
  .. ..$ year        : chr "2009"
  .. ..$ note        : chr "{ISBN} 3-900051-07-0"
  .. ..$ url         : chr "http://www.R-project.org"
  .. ..- attr(*, "bibtype")= chr "Manual"
  .. ..- attr(*, "key")= chr "R"
 - attr(*, "class")= chr "bibentry"
 - attr(*, "strings")= Named chr(0) 
  ..- attr(*, "names")= chr(0) 
> rref$url
[1] "http://www.R-project.org"

If you prefer Excel, you might want to convert your bib file to XML first. I haven't tried that, though.