Solved – Cook’s distance vs. hat values

cooks-distanceleverage

What exactly does Cook's distance measure? And how is this different from what hat values measure?

I know hat values measure how distant a point it form its corresponding fitted point. I also know Cook's distance measures the influence of a point (whether it changes the fitted line) but what exactly does it measure?

I other words, what exactly is the difference between hat values and Cook's distance?

Best Answer

The cook's distance is given by the formula: $D_{i} = \frac{\sum_{j = 1}^{n} (\hat Y_j - \hat Y_{j(i)})^2}{pMSE}$

Where:

$\hat Y_j$ is the fitted value for the j observation;
$ \hat Y_{j(i)}$ is the fitted value for the j observation without including the i-th observation in the data that will generate the model;
p is the number of parameters in the model;
MSE ie the mean squared error of the model.

This means that the cook's distance measures the influence of each observation in the model,or "what would happen if each observation wasn't in the model", and it's important because it's one way of detecting outliers that affects specially the regression line. When we don't look for and treat potential outliers in our data, it is possible that the adjusted coefficients for the model might not be the most representative, or appropriate, leading to incorrect inference.

The hat values are the fitted values, or the predictions made by the model for each observation. It is quite different from the Cook's distance.

Related Solutions

Solved – What kind of residuals and Cook’s distance are used for GLM

If you take a look at the code (simple type plot.lm, without parenthesis, or edit(plot.lm) at the R prompt), you'll see that Cook's distances are defined line 44, with the cooks.distance() function. To see what it does, type stats:::cooks.distance.glm at the R prompt. There you see that it is defined as

(res/(1 - hat))^2 * hat/(dispersion * p)

where res are Pearson residuals (as returned by the influence() function), hat is the hat matrix, p is the number of parameters in the model, and dispersion is the dispersion considered for the current model (fixed at one for logistic and Poisson regression, see help(glm)). In sum, it is computed as a function of the leverage of the observations and their standardized residuals. (Compare with stats:::cooks.distance.lm.)

For a more formal reference you can follow references in the plot.lm() function, namely

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Moreover, about the additional information displayed in the graphics, we can look further and see that R uses

plot(xx, rsp, ...                    # line 230
panel(xx, rsp, ...)                  # line 233
cl.h <- sqrt(crit * p * (1 - hh)/hh) # line 243
lines(hh, cl.h, lty = 2, col = 2)    #
lines(hh, -cl.h, lty = 2, col = 2)   #

where rsp is labeled as Std. Pearson resid. in case of a GLM, Std. residuals otherwise (line 172); in both cases, however, the formula used by R is (lines 175 and 178)

residuals(x, "pearson") / s * sqrt(1 - hii)

where hii is the hat matrix returned by the generic function lm.influence(). This is the usual formula for std. residuals:

$$rs_j=\frac{r_j}{\sqrt{1-\hat h_j}}$$

where $j$ here denotes the $j$th covariate of interest. See e.g., Agresti Categorical Data Analysis, §4.5.5.

The next lines of R code draw a smoother for Cook's distance (add.smooth=TRUE in plot.lm() by default, see getOption("add.smooth")) and contour lines (not visible in your plot) for critical standardized residuals (see the cook.levels= option).

Solved – .hat in regression output

Those would be the diagonal elements of the hat-matrix which describe the leverage each point has on its fitted values.

If one fits $\vec{Y} = \mathbf{X} \vec{\beta} + \vec{\epsilon}$ then $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$.

In this example:

$$\begin{pmatrix}Y_1\\ \vdots\\ Y_{32}\end{pmatrix} = \begin{pmatrix} 1 & 2.620\\ \vdots\\ 1 & 2.780 \end{pmatrix} \cdot \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} + \begin{pmatrix}\epsilon_1\\ \vdots\\ \epsilon_{32}\end{pmatrix}$$

Then calculating this $\mathbf{H}$ matrix results in:

library(MASS)
wt <- mtcars[,6]

X <- matrix(cbind(rep(1,32),wt), ncol=2)

X%*%ginv(t(X)%*%X)%*%t(X)

Where this last matrix is a $32\times 32$ matrix and contains these hat values on the diagonal.

Hat matrix on Wikpedia

Fun fact

It is called the hat matrix since it puts the hat on $\vec{Y}$: $$ \hat{\vec{Y}} = \mathbf{H}\vec{Y} $$

Best Answer

Related Solutions

Solved – What kind of residuals and Cook’s distance are used for GLM

Solved – .hat in regression output

Fun fact

Related Question