In Foundations of Linear and Generalized Linear Models, Agresti makes a comment on page 131 about likelihood ratio, Wald, and Score testing of regression parameters.
For the best-known GLM, the normal linear model, the three types of
inference provide identical results.
I tried this out in R to see what would happen, and I got different p-values when I did my own likelihood ratio test versus the default printout in "summary()" that uses Wald, so something about my interpretation of Agresti's comment is incorrect.
set.seed(2020)
N <- 100
x <- rbinom(N, 1, 0.5)
err <- rnorm(N)
y <- 0.5*x + err
G0 <- glm(y ~ 1, family="gaussian")
G1 <- glm(y ~ x, family="gaussian")
test_stat <- summary(G0)$deviance -
summary(G1)$deviance
df <- dim(summary(G1)$coefficients)[1] -
dim(summary(G0)$coefficients)[1]
p.value <- 1-pchisq(test_stat, df)
p.value
summary(G1)$coefficients[2, 4]
However, I did a simulation of many repetitions to check for long-run performance, and the results are about the same.
set.seed(2020)
N <- 100 # sample size
R <- 1000 # number of simulations
alpha <- 0.05
lrt_r <- wld_r <- rep(0,R)
for (i in 1:R){
x <- rbinom(N, 1, 0.5)
err <- rnorm(N)
y <- 0.5*x + err
G0 <- glm(y ~ 1, family="gaussian")
# intercept-only model
G1 <- glm(y ~ x, family="gaussian")
# model with x as a predictor
test_stat <- summary(G0)$deviance -
summary(G1)$deviance
df <- dim(summary(G1)$coefficients)[1] -
dim(summary(G0)$coefficients)[1]
lr <- 1-pchisq(test_stat, df)
# likelihood ratio test p-value
wd <- summary(G1)$coefficients[2, 4]
# Wald test p-value
# check if the p-values warrant rejection at the level of alpha
#
if (lr <= alpha){lrt_r[i] <- 1}
if (wd <= alpha){wld_r[i] <- 1}
}
# Check the power of each test
#
sum(lrt_r)/R*100 # 70.4%
sum(wld_r)/R*100 # 69.9%
This is close enough to suggest to me that the difference is due to a finite number of repetitions and/or something about that particular 2020 seed (though seeds 1 and 7 also give likelihood ratio testing slightly higher power, which I find suspicious).
Is that what's going on in Agresti's quote, that the three methods may not give identical results on any particular data set but will have the same long-run performance on many samples drawn from the same population?
(I didn't address score testing here, and I am content to prioritize Wald versus likelihood ratio testing.)
Reference
Agresti, Alan. Foundations of linear and generalized linear models. John Wiley & Sons, 2015.
Best Answer
Exact equivalence only holds if the error variance is known, see Exact equivalence of LR and Wald in linear regression under known error variance. Else, Wald, likelihood ratio and Lagrange multiplier are related via $W\geq LR\geq LM$ in a normal likelihood framework and equivalence only obtains asymptotically, as illustrated by the slightly revised version of your code below.
Notice that the above mentioned ranking assumes that error variances estimates are based on the ML estimate $1/n\sum_ie_i^2$ instead of the unbiased estimate $1/(n-k)\sum_ie_i^2$. The t-statistic retrieved from
lm
uses the latter, so that it is not exactly correct that the squared t-statistic equals the Wald statistic, so that, as in the numerical example below where we havesummary(G1)$coefficients[2,3]^2<test_stat
, the ranking need not emerge. We would obtain the likelihood-based Wald statistic fromsummary(G1)$coefficients[2,3]^2*(N-2)/N
, for which the ranking would again be satisfied.