Solved – Polynomial fit: removing outliers

correlationfittingoutliersregressionscatterplot

I want to fit a scatter plot with a polynomial, and find the correlation between two variables.

1) How can I define and remove outliers from data points?
(in the figure the outliers on the right misled the polynomial fit, which didn't capture the linear relationship)

enter image description here

where SE is the squared error, R is the Pearson correlation coefficient, and $\rho$ is the Spearman Correlation Coefficient.

Best Answer

In the picture, you posted, outlier is on the x axis. We can remove them using IQR and example code of doing it in R can be found here

Here is an example on simulated data for your case:left subfigure is the data without outlier, the right subfigure is the data with outlier. (I am manually adding 3 data points in mtcars data.)

As you can see, those 3 data points make the regression line flat.

enter image description here

Code

par(mfrow=c(1,2))
d=mtcars[,c("wt","mpg")]
plot(d)
fit=lm(mpg~wt,d)
summary(fit)
abline(fit)

d2=rbind(d,c(40,20),c(45,20),c(50,20))
plot(d2)
fit2=lm(mpg~wt,d2)
summary(fit2)
abline(fit2)