Solved – Why is maximum likelihood estimation considered to be a frequentist technique

frequentistmaximum likelihood

Frequentist statistics for me is synonymous for trying to make decision that are good for all possible samples. I.e., a frequentist decision rule $\delta$ should always try to minimize the frequentist risk, which depends on a loss function $L$ and the true state of nature $\theta_0$:

$$R_\mathrm{freq}=\mathbb{E}_{\theta_0}(L(\theta_0,\delta(Y))$$

How is maximum likelihood estimation connected to the frequentist risk? Given that it is the most used point estimation technique used by frequentists there must be some connection. As far as I know, maximum likelihood estimation is older than the concept of frequentist risk but still there must be some connection why else would so many people claim that it is a frequentist technique?

The closest connection that I have found is that

"For parametric models that satisfy weak regularity conditions, the maximum
likelihood estimator is approximately minimax" Wassermann 2006, p. 201"

The accepted answer either links maximum likelihood point estimation stronger to the frequentist risk or provides an alternative formal definition of frequentist inference that shows that MLE is a frequentist inference technique.

Best Answer

You apply a relatively narrow definition of frequentism and MLE - if we are a bit more generous and define

  • Frequentism: goal of consistency, (asymptotic) optimality, unbiasedness, and controlled error rates under repeated sampling, independent of the true parameters

  • MLE = point estimate + confidence intervals (CIs)

then it seems pretty clear that MLE satisfies all frequentist ideals. In particular, CIs in MLE, as p-values, control the error rate under repeated sampling, and do not give the 95% probability region for the true parameter value, as many people think - hence they are through and through frequentist.

Not all of these ideas were already present in Fisher's foundational 1922 paper "On the mathematical foundations of theoretical statistics", but the idea of optimality and unbiasedness is, and Neyman latter added the idea of constructing CIs with fixed error rates. Efron, 2013, "A 250-year argument: Belief, behavior, and the bootstrap", summarizes in his very readable history of the Bayesian/Frequentist debate:

The frequentist bandwagon really got rolling in the early 1900s. Ronald Fisher developed the maximum likelihood theory of optimal estimation, showing the best possible behavior for an estimate, and Jerzy Neyman did the same for confidence intervals and tests. Fisher’s and Neyman’s procedures were an almost perfect fit to the scientific needs and the computational limits of twentieth century science, casting Bayesianism into a shadow existence.

Regarding your more narrow definition - I mildly disagree with your premise that minimization of frequentist risk (FR) is the main criterion to decide if a method follows frequentist philosophy. I would say the fact that minimizing FR is a desirable property follows from frequentist philosophy, rather than preceding it. Hence, a decision rule / estimator does not have to minimize FR to be frequentist, and minimizing FR is also does not necessarily say that a method is frequentist, but a frequentist would in doubt prefer minimization of FR.

If we look at MLE specifically: Fisher showed that MLE is asymptotically optimal (broadly equivalent to minimizing FR), and that was certainly one reason for promoting MLE. However, he was aware that optimality did not hold for finite sample size. Still, he was happy with this estimator due to other desirable properties such as consistency, asymptotic normality, invariance under parameter transformations, and let's not forget: ease to calculate. Invariance in particular is stressed abundantly in the 1922 paper - from my reading, I would say maintaining invariance under parameter transformation, and the ability to get rid of the priors in general, were one of his main motivations in choosing MLE. If you want to understand his reasoning better, I really recommend the 1922 paper, it's beautifully written and he explains his reasoning very well.