I found an outlier using the outlierTest
function in the car package. However, I can see from the results that the Externally Studentized Residual and p-values.
This is a result.
This indicates that the 718th observation has an outlier. right??
The code to derive the result is as follows.
credit<-read.csv("german.csv", header = TRUE)
F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) credit[,i]=as.factor(credit[,i])
german_logit<-glm(Creditability~.,data=credit, family = "binomial")
library("car")
german_outlier<-outlierTest(german_logit,n.max=9999)
german_outlier
If so, is it correct to delete the 718th observation?
I want to know what variable has outlier and its value, because I want to change that value as proper value. What function do I have to use?
Best Answer
So few points of clarity:
What do you mean by an outlier? Here observation 718 is such that its dependent variable in the
glm
model has an unusual value based on its independent variables. If you look at the dataset in a different way i.e. using say bivariate analysis on another variable, the same observation or may not get flagged as an outlier.To display data values use
credit[718,]
for more information on subsetting use?'['
in console to pull up the help page.You're passing all variables to the model using formula
Creditability ~.
so your outlier will be a row, and not a single variable.Now onto deleting observations, it is advised to instead create a column or a list as an indicator of outliers / outlier row numbers. In such a way, you can subset your data set according and you never lose data.