Solved – Information out of the hat matrix for logistic regression

logisticregression

It is clear to me, and well explained on multiple sites, what information the values on the diagonal of the hat matrix give for linear regression.

The hat matrix of a logistic regression model is less clear to me. Is it identical to the information you get out of the hat matrix applying linear regression? This is the definition of the hat matrix I found on another topic of CV (source 1):

$H=VX ( X'V X)^-1 X' V$

with X the vector of predictor variables and V is a diagonal matrix with $\sqrt{(π(1−π))}$.

Is it, in other words, also true that the particular value of the hat matrix of an observation also just presents the position of the covariates in the covariate space, and has nothing to do with the outcome value of that observation?

This is written in the book "Categorical data analysis" of Agresti:

The greater an observation’s lever- age, the greater its potential
influence on the fit. As in ordinary regression, the leverages fall
between 0 and 1 and sum to the number of model parameters. Unlike
ordinary regression, the hat values depend on the fit as well as the
model matrix, and points that have extreme predictor values need not
have high leverage.

So out of this definition, it seems we can not use it as we use it in ordinary linear regression?

Source 1: How to calculate the hat matrix for logistic regression in R?

Best Answer

Let me change the notation a bit and write the hat matrix as $$H = V^{\frac{1}{2}}X(X'VX)^{-1}X'V^{\frac{1}{2}}$$ where $V$ is a diagonal symmetric matrix with general elements $v_j = m_j \pi (x_j) \left[1 - \pi (x_j) \right]$. Denote $m_j$ as the groups of individuals with the same covariate value $x = x_j$. You can obtain the $j^{th}$ diagonal element ($h_j$) of the hat matrix as $$h_j = m_j \pi (x_j) \left[1 - \pi (x_j) \right] x'_j (X'VX)^{-1}x'_j$$ Then the sum of $h_j$ gives the number of parameters as in linear regression. Now to your question:

The interpretation of the leverage values in the hat matrix depends on the estimated probability $\pi$. If $0.1 < \pi < 0.9$, you can interpret the leverage values in a similar fashion as in the linear regression case, i.e. being further away from the mean gives you higher values. If you are in the extreme ends of the probability distribution, these leverage values might not measure distance anymore in the same sense. This is shown in the figure below taken from Hosmer and Lemeshow (2000):

enter image description here

In this case the most extreme values in the covariate space can give you the smallest leverage, which is contrary to the linear regression case. The reason is that leverage in linear regression is a monotonic function, which is not true for the non-linear logistic regression. There is a monotonically increasing part in the above formulation of the diagonal elements of the hat matrix which represents distance from the mean. That is the $x'_j (X'VX)^{-1}x'_j$ part, which you might look at if you are only interested in distance per se. The majority of diagnostic statistics for logistic regressions utilize the full leverage $h_j$, so this separate monotonic part is rarely considered alone.

If you want to read deeper into this topic, have a look at the paper by Pregibon (1981), who derived the logistic hat matrix, and the book by Hosmer and Lemeshow (2000).