[Math] Low Leverage in Residuals, Logistic Regression

descriptive statisticsregressionregression analysisstatistics

I am doing an interpretation of logistic regression and I have an observation withh high residuals but low leverage.

I thought that means that it is an outlier(bad prediction) but not influenctial(if you drop it, things donĀ“t change very much).

The point is that the cook distance and DfBetas in this observation are higher than in the rest and if I do an analysis without the outlier things change considerably.

Do you know why? I mean Cook distance dfbetas etc depends on leverage

On the other hand I have the contrary result. Points with very high leverage and low residual dont change things if you drop them.

Best Answer

The concepts of "leverage" point and "influenctial" point are not equivalent. High leverage in regression analysis refers to observations that are outlying values of the independent variables. More technically, high leverage points can be defined as those having no neighbouring points in a $R^n$ space, where $n$ is the number of independent regression variables. When leverage points occur in regression analyses, the fitted model commonly passes relatively close to them. As a result, high leverage obsevations have the potential to yield large changes in the parameter estimates when they are deleted. This explains why influential points "often" have high leverage, and vice versa. However, it is common to find observations where this is not true.

A high leverage point is not necessarily an influenctial point. The better way to understand this is to imagine a point at the extreme ranges of independent variables, but that is well aligned to the fitting model obtainable by all other observations: this point would clearly have high leverage, but its deletion would have a small effect on the model. In other words, it would be an outlier for the independent variables, but not an influenctial point.

On the other hand, a low leverage point is not necessarily a non-influenctial point. A way to understand this is to imagine a point at the middle ranges of independent variables, but that is not aligned to the fitting model obtainable by all other observations (e.g., an outlier for the dependent variable). This point would have low leverage, but its deletion could have a considerable impact on the regression model.

Related Question