Solved – Do points with high Cook’s distance necessarily have a high standardized residual, and vice-versa

cooks-distanceoutliersregressionresiduals

I have two questions below:

  1. Could a data point be an influential point if its cook distance is outstanding(greater than 4/(n-p-1)) while its standardised residual is less than 2? It looks like to me , to say a data point is an influential point, its Cook's distance has to be larger than 4/(n-p-1)) AND its standardised residual has to be greater than 2, am I correct?

  2. Sometimes when we delete an influential point, the regression line does't change much then we leave the data point in the model. But why the influential point didn't change the regression line much? According to its large cook's distance, the data point is an influential point, being an influential point means the data point has to change the regression line very much if it is deleted,otherwise it shouldn't be called an influential point, isn't it?

Best Answer

1. A data point can still be considered influential if it has a large Cook's Distance, even if it has a low standardized residual.

The following image (taken from p214 of Andy Field’s Discovering Statistics Using IBM SPSS 3e) may help to clarify the difference between these two concepts.

enter image description here

The red line depicts the regression model, while the dotted blue line represents the regression model if data point 8 is removed. Note that data point 8 has a very small residual statistic, as it’s very close to the red line. However, it would have a massive influence statistic (according to the textbook, it has a Cook’s distance of 227.14!) since the model changes radically when it is omitted.

2. I don't think it's right to say "Sometimes when we delete an influential point, the regression line does't change much".

If the regression model didn't change much with the omission of the data point, then I don't think it's fair to say that the data point was influential.

Here's Wikipedia's definition of an influential data point:

[I]n regression analysis an influential point is one whose deletion has a large effect on the parameter estimates

Related Question