Logistic Regression – Bias of Maximum Likelihood Estimators for Logistic Regression

biaslogisticmaximum likelihoodunbiased-estimator

I would like to understand a couple of fact on maximum likelihood estimators (MLEs) for logistic regressions.

  1. Is it true that, in general, the MLE for logistic regression is biased? I would say "yes". I know, for example, that sample dimension is related to the asymptotic bias of MLEs.

    Do you know any elementary examples of this phenomenon?

  2. If the MLE is biased, is it true that the covariance matrix of the MLEs is the inverse of the Hessian of the maximum likelihood function?

    edit: I have met this formula quite often and without any proof; it seems a quite arbitrary choice to me.

Best Answer

Consider the simple binary logistic regression model, with a binary dependent variable and only a constant and a binary regressor $T$. $$\Pr(Y_i=1\mid T_i=1) = \Lambda (\alpha + \beta T_i)$$ where $\Lambda$ is the logistic cdf, $\Lambda(u) = \left[1+\exp\{-u\}\right]^{-1}$.

In logit form we have $$\ln \left(\frac{\Pr(Y_i=1\mid T_i=1)}{1-\Pr(Y_i=1\mid T_i=1)}\right) = \alpha + \beta T_i$$

You have a sample of size $n$. Denote $n_1$ the number of observations where $T_i=1$ and $n_0$ those where $T_i=0$, and $n_1+n_0=n$. Consider the following estimated conditional probabilities:

$$\hat \Pr(Y=1\mid T=1)\equiv \hat P_{1|1} = \frac 1{n_1}\sum_{T_i=1}y_i$$

$$\hat \Pr(Y=1\mid T=0)\equiv \hat P_{1|0} = \frac 1{n_0}\sum_{T_i=0}y_i$$

Then this very basic model provides closed form solutions for the ML estimator:

$$\hat \alpha = \ln\left(\frac{\hat P_{1|0}}{1-\hat P_{1|0}}\right),\qquad \hat \beta = \ln\left(\frac{\hat P_{1|1}}{1-\hat P_{1|1}}\right)-\ln\left(\frac{\hat P_{1|0}}{1-\hat P_{1|0}}\right)$$

BIAS

Although $\hat P_{1|1}$ and $\hat P_{1|0}$ are unbiased estimators of the corresponding probabilities, the MLEs are biased, since the non-linear logarithmic function gets in the way -imagine what happens to more complicated models, with a higher degree of non-linearity.

But asymptotically, the bias vanishes since the probability estimates are consistent. Inserting directly the $\lim$ operator inside the expected value and the logarithm, we have $$\lim_{n\rightarrow \infty}E[\hat \alpha] = E\left[\ln\left(\lim_{n\rightarrow \infty}\frac{\hat P_{1|0}}{1-\hat P_{1|0}}\right)\right] = E\left[\ln\left(\frac{P_{1|0}}{1-P_{1|0}}\right)\right] =\alpha$$

and likewise for $\beta$.

VARIANCE-COVARIANCE MATRIX OF MLE
In the above simple case that provides closed-form expressions for the estimator, one could, at least in principle, go on and derive its exact finite-sample distribution and then calculate its exact finite sample variance-covariance matrix. But in general, the MLE has no closed form solution. Then we resort to a consistent estimate of the asymptotic variance-covariance matrix, which is indeed (the negative of) the inverse of the Hessian of the log-likelihood function of the sample, evaluated at the MLE. And there is no "arbitrary choice" here at all, but it results from asymptotic theory and the asymptotic properties of the MLE (consistency and asymptotic normality), that tells us that, for $\theta_0 = (\alpha, \beta)$, $${\sqrt n}(\hat \theta-\theta_0)\rightarrow_d N\left(0, -(E[H])^{-1}\right)$$

where $H$ is the Hessian. Approximately and for (large) finite samples, this leads us to

$$\operatorname{Var}(\hat \theta) \approx -\frac 1n(E[H])^{-1}\approx -\frac 1n\left(\frac 1n\hat H\right)^{-1}=-\hat H^{-1}$$