Logistic Regression – How to Calculate the Hat Matrix for Logistic Regression in R?

deviancelogisticr

I want to calculate the hat matrix directly in R for a logit model. According to Long (1997) the hat matrix for logit models is defined as:

$$H = VX(X'VX)^{-1} X'V$$

X is the vector of independent variables, and V is a diagonal matrix with $\sqrt{\pi(1-\pi)}$ on the diagonal.

I use the optim function to maximize the likelihood and derive the hessian. So I guess my question is: how do i calculuate $V$ in R?

Note: My likelihood function looks like this:

loglik <-  function(theta,x,y){
y <- y
x <- as.matrix(x)
beta <- theta[1:ncol(x)]
loglik <- sum(-y*log(1 + exp(-(x%*%beta))) - (1-y)*log(1 + exp(x%*%beta)))
return(-loglik)
}

And i feed this to the optim function as follows:

logit <- optim(c(1,1),loglik, y = y, x = x, hessian = T)

Where x is a matrix of independent variables, and y is a vector with the dependent variable.

Note: I know that there are canned procedures for doing this, but I need to do it from scratch

Best Answer

For logistic regression $\pi$ is calculated using formula

$$\pi=\frac{1}{1+\exp(-X\beta)}$$

So diagonal values of $V$ can be calculated in the following manner:

pi <- 1/(1+exp(-X%*%beta))
v <- sqrt(pi*(1-pi))

Now multiplying by diagonal matrix from left means that each row is multiplied by corresponding element from diagonal. Which in R can be achieved using simple multiplication:

VX <- X*v 

Then H can be calculated in the following way:

H <- VX%*%solve(crossprod(VX,VX),t(VX))

Note Since $V$ contains standard deviations I suspect that the correct formula for $H$ is

$$H=VX(X'V^2X)^{-1}X'V$$

The example code works for this formula.