Solved – Understanding the Behrens–Fisher problem

fiducialinference

This section of this article says:

Ronald Fisher in 1935 introduced fiducial inference in order to apply it to this problem. He referred to an earlier paper by W. V. Behrens from 1929. Behrens and Fisher proposed to find the probability distribution of
$$ T \equiv {\bar x_1 – \bar x_2 \over \sqrt{s_1^2/n_1 + s_2^2/n_2}}$$
where $\bar x_1$ and $\bar x_2$ are the two sample means, and $s_1$ and $s_2$ are their standard deviations. [ . . . ] Fisher approximated the distribution of this by ignoring the random variation of the relative sizes of the standard deviations,
$$ {s_1 / \sqrt{n_1} \over \sqrt{s_1^2/n_1 + s_2^2/n_2}}.$$

I find that I am disinclined to believe this. (Hence, Wikipedia is fallible!) At some point in the next couple of weeks I'm going to read what Fisher and Behrens and Bartlett wrote about this in th 1930s. For now, I'm looking at Fisher's book Statistical Methods and Scientific Inference. As with Edwin Jaynes, I'm getting the impression that the fact that he was occasionally an idiot in no way alters the fact that he was a great genius, but he didn't always express himself in the way that was best for communicating with lesser mortals. On page 97, Fisher writes about Bartlett:

[…]the reference set […] has not been limited to the subset having the ratio $s_1/s_2$ observed, but was eagerly seized upon by M. S. Bartlett, as though it were a defect in the test of significance of the composite hypothesis, that in special cases the criterion of rejection is less frequently attained by chance than in others. On reflexion I do not think one should expect anything else,[…]

Thus it seems to me that Fisher did not intend to "ignore" the "random variation of" the ratio $s_1/s_2$ as a means of approximation, but rather, he thought one should condition on $s_1/s_2$. This does seem like "conditioning on an ancillary statistic", which Fisher employed so successfully in other contexts.

If I recall correctly, I first heard of Bartlett when I read about this in the Encyclopedia of Statistical Science, which said simply that Bartlett was the first to show that fiducial intervals are not the same thing as confidence intervals, by showing that the fiducial intervals that Fisher had derived in this problem did not have constant coverage rates. That statement didn't leave me with the impression that there was some controversy about this.

So here's my question: Which is closer to the truth: the Wikipedia article or my suspicion?

  • Fisher, R. A. (1935) "The fiducial argument in statistical inference", Annals of Eugenics, 8, 391–398.

Best Answer

I may have mentioned this on the site once before. I will try to find a link to a post where I discussed this. Around 1977 when I was a graduate student at Stanford we had a Fisher seminar that I enrolled in. A number of Stanford professors and visitors participated including Brad Efron and visitors Seymour Geisser and David Hinkley. Jimmie Savage had just at that time published an article with the title "On Rereading R. A. Fisher" in Annal of Statistics I think. Since you are so interested in Fisher I recommend you find and read this paper.

Motivated by the paper the seminar was designed to reread many of Fisher's famous papers. My assignment was the article on the Behrens-Fisher problem. My feeling is that Fisher was vain and stubborn but never foolish. He had great geometric intuition and at times had difficulty communicating with others. He had a very cordial relationship with Gosset but harsh disagreements with Karl Pearson (maximum likelihood vs method of moments) and with Neyman and Egon Pearson (significance testing via fiducial inference vs the Neyman-Pearson approach to hypothesis testing). Although the fiducial argument is generally considered to be Fisher's only big flaw and has been discredited, the approach is not totally dead and there has been new research in it in recent years.

I think that fiducial inference was Fisher's way to try to be an "objective Bayesian". I am sure he thought long and hard about the statistical foundations. He didn't accept the Bayesian approach but also did not see the idea of basing inference on considering the possible samples that you didn't draw as making sense either. He believed that inference should be based only on the data at hand. This idea is a lot like Bayesian inference in that the Bayesians draw inference based soley on the data (the likelihood) and the parameters (the prior distribution). Fisher in my view was thinking a lot like Jeffreys except that he wanted inference to be based on the likelihood and wanted to dispense with the prior altogether. That is what led to fiducial inference.

A Link to the Savage article

The Biography by Fisher's daughter Joan Fisher Box

R A Fisher An Appreciation, Hinkley and Feinberg editors

A book by Erich Lehmann about Fisher and Neyman and the birth of Classical Statistics

This is a link to an earlier post that I commented on that you also posted. Behrens–Fisher problem

In conclusion I think I need to address your short question. If the statement you quoted "Fisher approximated the distribution of this by ignoring the random variation of the relative sizes of the standard deviations" is what you are referring to I think that is totally false. Fisher never ignored variation. I reiterate that I think the fiducial argument was grounded in the idea that the observed data and the likelihood function should be the basis of inference and not the other samples that you could have gotten from the population distribution. So I would side with you on this one. With respect to Bartlett as I recall from my study of this so many years ago, they also had heated debates on this and Bartlett made a good case and held his own in the debate.

Related Question