Solved – Testing the difference between two parameter estimates in binomial GLM

binomial distributiongeneralized linear modelrstandard error

There are some related posts on this issue, but no answers actually demonstrate the mechanics of how to accomplish the task that I could find. I want to compare two parameter estimates in a binomial GLM (but I expect the answer to this question will work for any GLM).

My model is of the form:

y ~ b0 + b1x1 + b2x2 + e(binom)

My question: How do you test for a difference between b1 and b2?

Running this model using glm() in R provides parameter estimates and standard errors for all b. So comparing them should be easy, but I'm just having a hard time figuring out exactly what to do.

Here are some example data to work with:

set.seed(1987)
y <- rbinom(n = 100, size = 1, prob = 0.5) # binomial response variable
set.seed(1988)
x1a <- rnorm(n = 100, mean = 2, sd = 3)
set.seed(1988)
x1b <- rnorm(n = 100, mean = 0, sd = 3)
x1 <- ifelse(y == 0, x1a, x1b) # negative relationship between y and x1
set.seed(1990)
x2a <- rnorm(n = 100, mean = 2, sd = 5)
set.seed(1990)
x2b <- rnorm(n = 100, mean = 15, sd = 5)
x2 <- ifelse(y == 0, x2a, x2b) # a strong, positive relationship between y and x2

Run this model and return relevant output:

an1 <- glm(y ~ x1 + x2, family=binomial)
s_an1 <- summary(an1)

s_an1$coefficients 
    paste("residual df = ", an1$df.residual)
paste("null.df = ", an1$df.null)

And get these estimates:

              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -4.2711210  1.0451565 -4.086585 4.377686e-05
x1          -0.1638442  0.1433595 -1.142891 2.530841e-01
x2           0.6130596  0.1378431  4.447519 8.686793e-06

residual df = 97
null df = 99

Notice that x2 is a significant positive predictor of y, and x1 is non-significant, negative predictor of y. As we all know, however, this is not good evidence that the parameter estimates are different.

(How) Can I use the standard errors to compare the estimates?

I'll point out that there appears to be a lot of information on the web about how to calculate confidence intervals, which are nice, but I want:

1) a parameter estimate for the difference (yes, I know — it's just the difference between them),

2) a standard error (probably sqrt(se.x1^2 + se.x2^2)),

3) a t-value (or equivalent distribution statistic), and

4) a p-value.

In my real-world scenario, x1 and x2 are measured on the same scale, but don't have the same mean or standard deviations (as in the example data above — see sd arguments to random number generators). Is standardization of some kind necessary in this case?

Help much appreciated, cheers!

Best Answer

You can simulate parameter estimates using the mvtnorm package with mean vector coef(an1) and covariance matrix vcov(an1) and then summarise them. Or you could bootstrap.

However, it's probably easier just to use the multcomp package to examine the contrast. In your model that would be:

library(multcomp)
cont <- glht(an1, linfct="x1 - x2 = 0")
summary(cont) ## estimate, standard error, z-statistic and p-value
confint(cont) ## confidence interval

Related Solutions

Optimization – Residual Standard Error Difference Between Optim and GLM in R

The issues is that the standard errors comes from

$$\hat\sigma^2 (X^\top X)^{-1}$$

where $\hat\sigma^2$ is the unbiased estimator and not the MLE. See summary.lm

summary.lm
#R function (object, correlation = FALSE, symbolic.cor = FALSE, 
#R     ...) 
#R {
#R    z <- object
#R    p <- z$rank
#R    rdf <- z$df.residual
#R    ...
#R    Qr <- qr.lm(object) 
#R    ... 
#R    r <- z$residuals
#R    f <- z$fitted.values
#R    w <- z$weights
#R    if (is.null(w)) {
#R         mss <- if (attr(z$terms, "intercept")) 
#R             sum((f - mean(f))^2)
#R         else sum(f^2)
#R         rss <- sum(r^2)
#R    }
#R    ...
#R    resvar <- rss/rdf
#R    ...
#R    R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
#R    se <- sqrt(diag(R) * resvar)
#R    ...

This is the inverse observed Fisher information for $(\beta_0, \beta_1)$ conditional on $\hat\sigma^2$. Now the inverse observed Fisher information you compute is for the triplet $(\beta_0, \beta_1, \sigma)$. I.e., you use the MLE of $\sigma$ and not the unbiased estimator. Thus, I gather the standard errors should differ by factor $\sqrt{n/(n-3 + 1)}$ or something similar. This is the case

set.seed(1)
n = 4 # very small sample size !
b0 <- 5
b1 <- 2
sigma <- 5
x <- runif(n, 1, 100)
y =  b0 + b1*x + rnorm(n, 0, sigma) 

negLL <- function(beta, y, x) {
  b0 <- beta[1]
  b1 <- beta[2]
  sigma <- beta[3]
  yhat <- b0 + b1*x
  return(-sum(dnorm(y, yhat, sigma, log = TRUE)))
}

res <- optim(c(0, 0, 1), negLL, y = y, x = x, hessian=TRUE)
estimates <- res$par     # Parameters estimates
(se <- sqrt(diag(solve(res$hessian))))
#R [1] 5.690 0.097 1.653
k <- 3
se * sqrt(n / (n-k+1))
#R [1] 8.047 0.137 2.338

To elaborate more as usεr11852 requests, the log-likelihood is

$$l(\vec{\beta},\sigma) = -\frac{n}{2}\log(2\pi) - n\log{\sigma} - \frac{1}{2\sigma^2}(\vec{y}-X\vec\beta)^\top(\vec{y}-X\vec\beta)$$

where $X$ is the design matrix and $n$ is the number of observation. Consequently, the observed information matrix is

$$-\nabla_{\vec{\beta}}\nabla_{\vec{\beta}}^\top l(\vec{\beta},\sigma) = \frac{1}{\sigma^2}X^\top X$$

Now we can either plug in the MLE or the unbaised estimator of $\sigma$ as the following show

m <- lm(y ~ x)
X <- cbind(1, x)
sqrt(sum(resid(m)^2)/n       * diag(solve(crossprod(X))))
#R                     x 
#R 5.71058285 0.09732149
k <- 3
sqrt(sum(resid(m)^2)/(n-k+1) * diag(solve(crossprod(X))))
#R                   x 
#R 8.0759837 0.1376334

We can do the same with a QR decomposition as lm does

obj <- qr(X)
sqrt(sum(resid(m)^2)/(n-k+1) * diag(chol2inv(obj$qr)))
#R [1] 8.0759837 0.1376334

So to answer

I understand from my readings on the web that optimization is not a simple task but I was wondering if it would be possible to reproduce in a simple way the standard error estimates from glm while using optim.

then you need to scale up the standard errors in the Gaussian example you use.

Beta Regression vs Quasi GLM – What is the Difference with Variance = ?(1-?)?

You're correct that the mean and variance functions are of the same form.

This suggests that in very large samples, as long as you don't have observations really close to 1 or 0 they should tend to give quite similar answers because in that situation observations will have similar relative weights.

But in smaller samples where some of the continuous proportions approach the bounds, the differences can grow larger because the relative weights given by the two approaches will differ; if the points that get different weights are also relatively influential (more extreme in x-space), the differences may in some cases become substantial.

In beta-regression you'd be estimating via ML, and in the case of a quasibinomial model - at least one estimated in R, note this comment in the help:

The quasibinomial and quasipoisson families differ from the binomial and poisson families only in that the dispersion parameter is not fixed at one, so they can model over-dispersion. For the binomial case see McCullagh and Nelder (1989, pp. 124–8). Although they show that there is (under some restrictions) a model with variance proportional to mean as in the quasi-binomial model, note that glm does not compute maximum-likelihood estimates in that model. The behaviour of S is closer to the quasi- variants.

I think in betareg you can get $h_{ii}$ values, and you can as well for GLMs, so at the two fitted models you can compare an approximation of each observation's relative influence (/"weight") on its own fitted value (since the other components of the ratio of influences should cancel, or nearly so). This should give a quick sense of which observations are looked at most differently by the two approaches. [One might do it more exactly by actually tweaking the observations one by one and seeing the change in fit per unit change in value]

Note that the betareg vignette gives some discussion of the connection between these models at the end of section 2.

Best Answer

Related Solutions

Optimization – Residual Standard Error Difference Between Optim and GLM in R

Beta Regression vs Quasi GLM – What is the Difference with Variance = ?(1-?)?

Related Question