To make a long story short, you should use a tool such as robust PCA analysis.
I may come back to this with a more substantive post, but the short version is
explained in this post
A dependent mixture model (hidden Markov model) may be of use, depending on the type of deviations expected.
Assume that your observations come from two distributions (or states), both of which are normally distributed, but have different mean and variance.
A number of parameters can be estimated: The initial state probabilities (2 parameters), the state transition probabilities between neighbouring data points (4 parameters) and finally the mean and variance of the two distributions (4 parameters).
In R, this model can be estimated using the depmixS4 package:
library(depmixS4)
set.seed(3)
y = rnorm(100)
y[30:35] <- rnorm(6,mean=4,sd=2)
plot(1:100,y,"l")
m <- depmix(y~1,nstates=2,ntimes=100)
fm <- fit(m)
means <- getpars(fm)[c(7,9)]
lines(1:100,means[fm@posterior$state],lwd=2,col=2)
See http://cran.r-project.org/web/packages/depmixS4/vignettes/depmixS4.pdf for references
Best Answer
In general you can define outliers differently, depending on what exactly you are trying to achieve. For example, a presence of observations with very high leverage won't necessarily indicate that they are effecting the regression at all. On the other hand, presence of values with high Cook Distance, can certainly do. It is also possible that some values will have both. High Studentized residuals can indicate Heteroscedasticity. Here's an illustration of how you can identify/inspect each when compared to your original data and fitted regression line
Create some dummy data set and fit a linear regression model
We will use
influencePlot
fromcar
package in order to identify outliers and plot them, whenCircles representing the observations proportional to Cooks distances
Now, we can get the corresponding row names of the, for example, 2 highest values in each
And plot them over the fitted regression line
You can clearly see that some of the outliers are overlapping, when the leverage ones (the blue triangles) can sometimes affect the regression line while in other occasions be almost on it, while the red squares (Cook Distance) always have high effect on the regression line.