[Tex/LaTex] How exactly does alpha.bst sort references

bibtexsorting

Consider the following .bib entries:

@Article{joe1,
  author =   {Joe Schmoe},
  title =    {Article One},
  journal =  {Some Journal},
  year =     2010}

@Article{joe2,
  author =   {Joe Schmoe},
  title =    {Article Two},
  journal =  {Some Journal},
  year =     1999}

@Article{jj,
  author =   {Jack Smith and Jill Alder},
  title =    {Article Three},
  journal =  {Some Journal},
  year =     2007}

The alpha.bst style assigns these the keys [Sch10], [Sch99], and [SA07] respectively.

If they were sorted according to the usual alphabetization by the first author's last name, then [Sch10] and [Sch99] would come before [SA07]. In fact, however, alpha.bst puts [SA07] before [Sch10] and [Sch99]. I guess the thinking is that someone will usually be looking through the reference list for a particular key that was cited in the paper, so the references should be alphabetized by the key rather than by the author name. This is at least logical.

However, if the references were actually sorted by key, then [Sch10] would also come before [Sch99]. But in fact, it's the other way round! What is it doing? Is it sorting on the alphabetic part of the key first and then on the full year (1999 < 2010) even though only two digits of the year are included in the key? If so, can anyone explain why?

Best Answer

Where did alpha.bst come from? We have to go back to the time when scientific papers were typewritten and then typeset by publishers or printed as camera ready copy. It's not so long ago, actually. In my field of activity, author-year citations were not used, nor footnote citations. In typeset papers one could find numeric citations in alphabetical order or alphabetic citations in various formats.

In other fields, numeric citations in order of appearance were possible, but this method is well suited only in fields were nobody actually browses the bibliography at the end of papers, because finding a paper with this method is, within long bibliographies, too hard.

Numeric citations in alphabetical order were not really suited for typescripts, but not difficult to produce from a hand written preliminary version. The most practical method was the alphabetic one. How to make the keys was left to the individual, although some standards were followed. We used

[MO1], [MO2]

for citing two papers by Menini and Orsatti; but it could have been

[MO2], [MO3]

in another paper because [MO1] was not cited in the latter and it was easier to just use the same keys in different papers. Keys composed with an alphabetic part and the year of publications would have been better, but the same problem would have arisen in case of two papers by the same author or authors in the same year.

The alpha.bst style implements this method. In the second half of the 20th century, citing research papers of the 19th century was quite rare, so a key formed with the first letter of the single author's surname or the initials of multiple authors' surnames followed by the last two digits of the publication year was practical. Cases like

Alpher, Bethe and Gamow

or

Aimée, Bardot and Girardot

were treated individually, so a method is as good as another. What does alpha.bst do in such a case? It just ignores that the same initials refer to a different set of authors. The example document

\begin{filecontents*}{\jobname.bib}
@article{abg1,
  author={Alpher, R. and Bethe, H. and Gamow, G.},
  title={The Origin of Chemical Elements},
  journal={Phys. Rev.},
  volume={73},
  number={7},
  pages={803-804},
  year=1948,
}
@article{abg2,
  author={Aim{\'{e}}e, A. and Bardot, B. and Girardot, A.},
  title={French Cinema},
  journal={J. Fol. Berg.},
  volume={1255},
  pages={9043-10324},
  year=1968,
}
\end{filecontents*}

\documentclass{article}
\begin{document}
We cite \cite{abg1} and \cite{abg2}.

\bibliographystyle{alpha}
\bibliography{\jobname}
\end{document}

will show

We cite [ABG48] and [ABG68]

but if the date in the cinema entry is changed into 1948, the order would be

We cite [ABG48b] and [ABG48a]

because Aimée comes before Alpher in alphabetical order.

The change of century poses a problem: how to interpret keys like

[XY00] [XY11] [XY32]

with just two digits? Well, [XY32] can't refer to a paper published in 2032 (now), while citing something published in 1900 or 1911 would be unlikely.

So what are the rules? The ordering of the bibliography should ease browsing it. The reader finds a key and looks in the bibliography following the column with keys. Having [Sch10] before [SA07] would make difficult finding the former. Of course, with a short bibliography, no ordering would be as good as any ordering; a long bibliography, with maybe a page break between the two would make difficult finding [SA07].

Hence, the first ordering is alphabetical by the alphabetic part of the key. The second level ordering is by the year part, assuming that readers implicitly reinserts the century, so they'll look for a paper in 1999 before one in 2007.

Is the system ambiguous? Maybe. Is the system practical? Maybe. Should it be used nowadays when producing citations with a numeric or author-year system is easy? No.