Logistic Regression Tests – Why Use the Wald Test in Logistic Regression?

likelihood-ratiologisticwald test

Some statistical software use the Wald statistic when reporting on the regression coefficients. As examples, R and Stata report Wald by default.

The logistic regression article on Wikipedia says, unfortunately without reference:

“Rather than the Wald method, the recommended method to calculate the p-value for logistic regression is the likelihood-ratio test (LRT)”

How are Wald and LRT calculated for a logistic regression coefficient (independent variable)? This is for a reminder on how they are calculated and to highlight their differences.

From the Wald test page on Wikipedia:

[T]he Wald test is not invariant to a reparametrization, while the Likelihood ratio tests will give exactly the same answer whether we work with R, log R or any other monotonic transformation of R.

So in the context of logistic regression, if you logged a regressor its p-value would be different compared to if it was unlogged (is this correct). Would the p-value change if the LRT was used?

From the same Wald test page:

The other reason is that the Wald test uses two approximations (that we know the standard error, and that the distribution is χ2), whereas the likelihood ratio test uses one approximation (that the distribution is χ2).

Although Wald and likelihood ratio are asymptotically equivalent, in the logistic regression we are usually in the pre-asymptote setting, so this is not a reason to view then as equivalent.

Thus it seems that the Wald test disadvantages outweigh the advantages in the logistic setting, and the likelihood ratio is better.

It is my guess that the Wald test is used by logistic regression software routines for its easier computational efficiency, which was more important in the past when software such as R and Stata were first created. Then, through backwards compatibility and not wanting to change the semantics of their logistic functions, the Wald statistic has remained the default. Is there any evidence for this being the case?

Should I be changing the default Wald to likelihood ratio? A lesser question, is there an easy was to do this in R?

Best Answer

In logistic regression (& other generalized linear models with canonical link functions), the coefficient estimates $\hat\theta$ are arrived at by Fisher Scoring: iterating $$\vec\theta_{k+1} = \vec\theta_k + \mathcal{I}^{-1}(\vec\theta_k)U(\vec\theta_k)$$ where $\mathcal{I}$ is the Fisher information & $U$ the score, until convergence. When you're done, you're left with the covariance matrix $\mathcal{I}^{-1}$ for the coefficient estimates; the square roots of its diagonal elements are the variances you need for Wald tests of each coefficient. So you get Wald tests for free, almost, just by fitting a model; but likelihood-ratio tests require fitting a new model for each coefficient you want to test—with a large sample size & many predictors they'd take a good while longer to conduct. (This is also true more generally: if you're using observed information (the negative Hessian of the log likelihood) rather than expected information; or even if you're finding maximum likelihood estimates with an algorithm that doesn't involve calculating the Hessian, it's quicker to evaluate the Hessian numerically than to fit lots of models.)

If the point of logistic regression were to always test whether each & every coefficient is equal to zero, then there'd be an argument for statistical software's defaulting to the likelihood-ratio test when displaying a summary of the fitted model. But as that isn't always, or even often, the point—& especially as with some models many of the hypotheses tested may well be of no interest at all in general (see What value does one set, in general, for the null hypothesis of β0 in a linear regression model?)— it makes sense to provide the Wald tests & leave the analyst to choose which, if any, further tests to conduct, & what method to use.^† (It would also make sense to provide no tests, & force the analyst to think about which, if any, tests to conduct, &c.)

† I don't know of any R function to conduct LRTs for all coefficients of a model individually—it wouldn't be hard to write one—but both stats:::drop1 & car:::Anova conduct them for a default set of null hypotheses more likely to be of interest.

NB invariance to reparametrization means only that the LRT for, say, $H_0: \beta_7 =0$ is the same as the LRT for $H_0: \frac{1}{1+\mathrm{e}^{-\beta_7}}=1$ (which isn't the case for the Wald test). Replacing $\beta_7$ with $\log \beta_7$, on the other hand, would be fitting a substantively different model.

Related Solutions

Solved – Wald test and Likelihood ratio test, where do the confidence intervals on the regression coefficients come from

When you fit a logistic regression model, there is no closed form solution for the parameter estimates unlike in linear regression. So instead, you search over the parameter space for a set of parameter estimates that maximize the log likelihood or minimize the deviance. (I usually prefer to think in terms of the deviance, but in this case it might be better to think of maximizing the log likelihood.) The most common search procedure is the Newton-Raphson algorithm.

As you search, you could map out the shape of the log likelihood, but this isn't really done. In the process of running the Newton-Raphson algorithm, you calculate the Hessian matrix (and it shouldn't be difficult to get if you run a search algorithm that doesn't use the Hessian instead). That provides a picture of the shape in the region of the parameter space where you are currently. The Wald test of the parameter is based on the assumption that the log likelihood has the shape of a normal distribution (which it would with infinite data but may not with small samples). An estimate of the standard deviation is calculated, and that is used as the standard error. This is typically what is used to form confidence intervals for betas.

The likelihood ratio test works differently. The ratio of the likelihoods is the difference of the log likelihoods. That is, it is the difference between the likelihoods of the model when a parameter is set at two different values. Essentially always, these two values are the maximum likelihood estimate and the null value (0). The difference between these two values should be distributed as chi-squared. It is less common to use this approach to determine the confidence interval for a beta, but it can be done. You need to search over possible beta estimates and work backward to find values that constitute the limits of the interval.

There is a very useful figure (actually taken from John Fox) at the bottom of the linked page that is extremely helpful for understanding this topic.

Statistical Test Equivalence – Are Likelihood Ratio, Wald, and Score Tests Equivalent?

Exact equivalence only holds if the error variance is known, see Exact equivalence of LR and Wald in linear regression under known error variance. Else, Wald, likelihood ratio and Lagrange multiplier are related via $W\geq LR\geq LM$ in a normal likelihood framework and equivalence only obtains asymptotically, as illustrated by the slightly revised version of your code below.

set.seed(2020)
N <- 1000000
x <- rbinom(N, 1, 0.5)
err <- rnorm(N)
y <- err
G0 <- lm(y ~ 1)
G1 <- lm(y ~ x)
test_stat <- 2*(as.numeric(logLik(G1)) - 
    as.numeric(logLik(G0)))

p.value <- 1-pchisq(test_stat, 1)
p.value
2*(1-pnorm(abs(summary(G1)$coefficients[2, 3])))

Notice that the above mentioned ranking assumes that error variances estimates are based on the ML estimate $1/n\sum_ie_i^2$ instead of the unbiased estimate $1/(n-k)\sum_ie_i^2$. The t-statistic retrieved from lm uses the latter, so that it is not exactly correct that the squared t-statistic equals the Wald statistic, so that, as in the numerical example below where we have summary(G1)$coefficients[2,3]^2<test_stat, the ranking need not emerge. We would obtain the likelihood-based Wald statistic from summary(G1)$coefficients[2,3]^2*(N-2)/N, for which the ranking would again be satisfied.

set.seed(2020)
N <- 10
x <- rbinom(N, 1, 0.5)
err <- rnorm(N)
y <- err
G0 <- lm(y ~ 1)
G1 <- lm(y ~ x)

# LR
2*(as.numeric(logLik(G1))-as.numeric(logLik(G0)))
N*log(sum(resid(G0)^2)/sum(resid(G1)^2))

# squared t-stat 
summary(G1)$coefficients[2, 3]^2

# Wald
N*(sum(resid(G0)^2) - 
    sum(resid(G1)^2))/sum(resid(G1)^2)

# corrected squared t which equals Wald
abs(summary(G1)$coefficients[2,3])^2*N/(N-2)

Best Answer

Related Solutions

Solved – Wald test and Likelihood ratio test, where do the confidence intervals on the regression coefficients come from

Statistical Test Equivalence – Are Likelihood Ratio, Wald, and Score Tests Equivalent?

Related Question