Solved – Outlier detection using outlierTest function

generalized linear modeloutliersr

I found an outlier using the outlierTest function in the car package. However, I can see from the results that the Externally Studentized Residual and p-values.
This is a result.

enter image description here

This indicates that the 718th observation has an outlier. right??

The code to derive the result is as follows.

credit<-read.csv("german.csv", header = TRUE)

F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) credit[,i]=as.factor(credit[,i])

german_logit<-glm(Creditability~.,data=credit, family = "binomial")
library("car")
german_outlier<-outlierTest(german_logit,n.max=9999)
german_outlier

If so, is it correct to delete the 718th observation?

I want to know what variable has outlier and its value, because I want to change that value as proper value. What function do I have to use?

Best Answer

So few points of clarity:

What do you mean by an outlier? Here observation 718 is such that its dependent variable in the glm model has an unusual value based on its independent variables. If you look at the dataset in a different way i.e. using say bivariate analysis on another variable, the same observation or may not get flagged as an outlier.

To display data values use credit[718,] for more information on subsetting use ?'[' in console to pull up the help page.

You're passing all variables to the model using formula Creditability ~. so your outlier will be a row, and not a single variable.

Now onto deleting observations, it is advised to instead create a column or a list as an indicator of outliers / outlier row numbers. In such a way, you can subset your data set according and you never lose data.