Solved – Cook’s distance vs. hat values

cooks-distanceleverage

What exactly does Cook's distance measure? And how is this different from what hat values measure?

I know hat values measure how distant a point it form its corresponding fitted point. I also know Cook's distance measures the influence of a point (whether it changes the fitted line) but what exactly does it measure?

I other words, what exactly is the difference between hat values and Cook's distance?

Best Answer

The cook's distance is given by the formula: $D_{i} = \frac{\sum_{j = 1}^{n} (\hat Y_j - \hat Y_{j(i)})^2}{pMSE}$

Where:

  • $\hat Y_j$ is the fitted value for the j observation;
  • $ \hat Y_{j(i)}$ is the fitted value for the j observation without including the i-th observation in the data that will generate the model;
  • p is the number of parameters in the model;
  • MSE ie the mean squared error of the model.

This means that the cook's distance measures the influence of each observation in the model,or "what would happen if each observation wasn't in the model", and it's important because it's one way of detecting outliers that affects specially the regression line. When we don't look for and treat potential outliers in our data, it is possible that the adjusted coefficients for the model might not be the most representative, or appropriate, leading to incorrect inference.

The hat values are the fitted values, or the predictions made by the model for each observation. It is quite different from the Cook's distance.

Related Question