Solved – .hat in regression output

leveragerregression

The augment() function in the broom package for R creates a dataframe of predicted values from a regression model. Columns created include the fitted values, the standard error of the fit and Cook's distance. They also include something with which I'm not familar and that is the column .hat.

Can anyone explain what this value is, and is it different between linear regression and logistic regression?

library(broom)
data(mtcars)

m1 <- lm(mpg ~ wt, data = mtcars)

head(augment(m1))

          .rownames  mpg    wt  .fitted   .se.fit     .resid       .hat   .sigma      .cooksd
1         Mazda RX4 21.0 2.620 23.28261 0.6335798 -2.2826106 0.04326896 3.067494 1.327407e-02
2     Mazda RX4 Wag 21.0 2.875 21.91977 0.5714319 -0.9197704 0.03519677 3.093068 1.723963e-03
3        Datsun 710 22.8 2.320 24.88595 0.7359177 -2.0859521 0.05837573 3.072127 1.543937e-02
4    Hornet 4 Drive 21.4 3.215 20.10265 0.5384424  1.2973499 0.03125017 3.088268 3.020558e-03
5 Hornet Sportabout 18.7 3.440 18.90014 0.5526562 -0.2001440 0.03292182 3.097722 7.599578e-05
6           Valiant 18.1 3.460 18.79325 0.5552829 -0.6932545 0.03323551 3.095184 9.210650e-04

Best Answer

Those would be the diagonal elements of the hat-matrix which describe the leverage each point has on its fitted values.

If one fits $\vec{Y} = \mathbf{X} \vec{\beta} + \vec{\epsilon}$ then $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$.

In this example:

$$\begin{pmatrix}Y_1\\ \vdots\\ Y_{32}\end{pmatrix} = \begin{pmatrix} 1 & 2.620\\ \vdots\\ 1 & 2.780 \end{pmatrix} \cdot \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} + \begin{pmatrix}\epsilon_1\\ \vdots\\ \epsilon_{32}\end{pmatrix}$$

Then calculating this $\mathbf{H}$ matrix results in:

library(MASS)
wt <- mtcars[,6]

X <- matrix(cbind(rep(1,32),wt), ncol=2)

X%*%ginv(t(X)%*%X)%*%t(X)

Where this last matrix is a $32\times 32$ matrix and contains these hat values on the diagonal.

Hat matrix on Wikpedia

Fun fact

It is called the hat matrix since it puts the hat on $\vec{Y}$: $$ \hat{\vec{Y}} = \mathbf{H}\vec{Y} $$

Related Solutions

Solved – R glmer.nb output. How to get $\hat{\theta}$

Looking at the source code, this isn't something that is trivially exposed. Essentially, glmer.nb is (from github sources):

glmer.nb <- function(..., interval = log(th)+c(-3,3), verbose=FALSE)
{
  g0 <- glmer(..., family=poisson)
  th <- est_theta(g0)
  if(verbose) cat("th := est_theta(glmer(..)) =", format(th),"\n")
  g1 <- update(g0, family = negative.binomial(theta=th))
  ## if (is.null(interval)) interval <- log(th)+c(-3,3)
  optTheta(g1, interval=interval, verbose=verbose)
}

Which essentially is:

fit a standard Poisson GLMM
estimate $\theta$ using maximum likelihood estimation given this standard Poisson fit. This uses MASS::theta.ml()
refit the model using this value of $\theta$ and the negative.binomial() family function which needs a fixed $\theta$,
~~last line does an optimization to find $\theta$ given the updated fit, and that function throws an error if the original estimate from 2. and the optimised one from 4. don't match.~~

the last line performs 1-d optimisation on the interval log(theta) + c(-3, 3) via optimize() to find the value of $\theta$ that minimises the deviance of the negative binomial GLMM.

That said, you might try:

lme4:::getNBdisp(fit)

where fit is the model returned by glmer.nb. This is just a guess based on reading the code but it is used by the authors in tests etc.

Hat Matrix and Leverages – Classical Multiple Regression Techniques

The hat matrix, $\bf H$, is the projection matrix that expresses the values of the observations in the independent variable, $\bf y$, in terms of the linear combinations of the column vectors of the model matrix, $\bf X$, which contains the observations for each of the multiple variables you are regressing on.

Naturally, $\bf y$ will typically not lie in the column space of $\bf X$ and there will be a difference between this projection, $\bf \hat Y$, and the actual values of $\bf Y$. This difference is the residual or $\bf \varepsilon=Y-X\beta$:

The estimated coefficients, $\bf \hat\beta_i$ are geometrically understood as the linear combination of the column vectors (observations on variables $\bf x_i$) necessary to produce the projected vector $\bf \hat Y$. We have that $\bf H\,Y = \hat Y$; hence the mnemonic, "the H puts the hat on the y."

The hat matrix is calculated as: $\bf H = X (X^TX)^{-1}X^T$.

And the estimated $\bf \hat\beta_i$ coefficients will naturally be calculated as $\bf (X^TX)^{-1}X^T$.

Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. However, the points farther away at the extreme of the regressor values will have more leverage. Here is an example of an extremely asymptotic point (in red) really pulling the regression line away from what would be a more logical fit:

So, where is the connection between these two concepts: The leverage score of a particular row or observation in the dataset will be found in the corresponding entry in the diagonal of the hat matrix. So for observation $i$ the leverage score will be found in $\bf H_{ii}$. This entry in the hat matrix will have a direct influence on the way entry $y_i$ will result in $\hat y_i$ ( high-leverage of the $i\text{-th}$ observation $y_i$ in determining its own prediction value $\hat y_i$):

Since the hat matrix is a projection matrix, its eigenvalues are $0$ and $1$. It follows then that the trace (sum of diagonal elements - in this case sum of $1$'s) will be the rank of the column space, while there'll be as many zeros as the dimension of the null space. Hence, the values in the diagonal of the hat matrix will be less than one (trace = sum eigenvalues), and an entry will be considered to have high leverage if $>2\sum_{i=1}^{n}h_{ii}/n$ with $n$ being the number of rows.

The leverage of an outlier data point in the model matrix can also be manually calculated as one minus the ratio of the residual for the outlier when the actual outlier is included in the OLS model over the residual for the same point when the fitted curve is calculated without including the row corresponding to the outlier: $$Leverage = 1-\frac{\text{residual OLS with outlier}}{\text{residual OLS without outlier}}$$ In R the function hatvalues() returns this values for every point.

Using the first data point in the dataset {mtcars} in R:

    fit = lm(mpg ~ wt, mtcars) # OLS including all points
    X = model.matrix(fit) # X model matrix
    hat_matrix = X%*%(solve(t(X)%*%X)%*%t(X)) # Hat matrix
    diag(hat_matrix)[1] # First diagonal point in Hat matrix
    fitwithout1 = lm(mpg ~ wt, mtcars[-1,]) # OLS excluding first data point.
    new = data.frame(wt=mtcars[1,'wt']) # Predicting y hat in this OLS w/o first point.
    y_hat_without = predict(fitwithout1, newdata=new) # ... here it is.
    residuals(fit)[1] # The residual when OLS includes data point.
    lev = 1 - (residuals(fit)[1]/(mtcars[1,'mpg'] -  y_hat_without)) # Leverage
    all.equal(diag(hat_matrix)[1],lev) #TRUE

Best Answer

Fun fact

Related Solutions

Solved – R glmer.nb output. How to get $\hat{\theta}$

Hat Matrix and Leverages – Classical Multiple Regression Techniques

Related Question