Probability – Graphs Resembling the Math Genealogy Graph and Family Concentration

I was talking with a non-mathematician the other week at a workshop about the fact that many mathematicians, like myself, are indexed in the math genealogy database. We talked a little about how many people tend to have family trees linking back to a few influential/well-known mathematicians (Newton/Gauss/Euler/etc…). I looked online later and this casual observation seems to have been examined closer (in this paper) and about 65 percent of the +200k nodes of the network fits within 24 families.

I have some hypotheses for the cause of this concentration and expect that sociological factors play in heavily, but I still wondered if a random graph model with similar properties to the genealogy network would explain this effect at least partially.

For simplicity maybe it would be best to assume a forest (collection of trees) structure on the random model $G$ (ignoring the case where someone has more than one advisor). Some potentially useful properties are below:

It seems reasonable that the maximum out-degree of $G$ increases as one descends down the generations of the tree (corresponding to time), since more Ph.D.'s tend to be awarded now than before.
Also, there are more Ph.D. students now that do not go onto direct Ph.D. students themselves (since the number of math Ph.D.'s is about as high as ever, but there are only a select number of Ph.D. awarding institutions at which one can supervise Ph. D.'s). I think these first two conditions can be emulated by adjusting the distribution for $G$ at each generation.
There also should be a much lower probability of someone
getting a Ph.D. supervised by someone who is not a mathematician (a
disconnected node being generated ) vs the standard case of a
descendent being generated in the graph $G$.
There should probably be a small number of nodes to start with, but this seems less important.

Altogether, there are two questions that I have.

Question 1: Is there a standard random graph model that emulates these properties of the math genealogy graph?

Question 2: If so, does that model have concentration of the network in a small number of families, if the network is allowed to generate for a sufficient length of time?

I don't have enough intuition about random graphs to answer the second question with some very simple random graph models, so would be interested if anyone can point out results for a different setting than the one outlined here.

Best Answer

Precisely this question was the starting point of the Galton - Watson theory of branching processes. To quote the opening paragraph of their 1875 paper On the Probability of the Extinction of Families:

The decay of the families of men who occupied conspicuous positions in past times has been a subject of frequent remark, and has given rise to various conjectures. It is not only the families of men of genius or those of the aristocracy who tend to perish, but it is those of all with whom history deals, in any way, even of such men as the burgesses of towns, ...

Best Answer

Related Solutions

Related Question