[Math] Multiple outliers for two variable linear regression

ca.classical-analysis-and-odeslinear algebrast.statistics

Problem

Visually, the "extreme" outliers in the following graph are somewhat obvious:

Question

Given:

T – Set of all temperatures
Y – Set of all years
ΣT – Sum of temperatures.
ΣY – Sum of years.
N – Number of elements
T(n) – Temperature of the nth element in the temperature set

How do you determine if T(n) is an outlier?

Related Sites

The math on some of these sites is a bit above my understanding:

Many thanks!

Best Answer

I might suggest LTS, the Least Trimmed Squares, approach. there is code in fortran and matlab, the latter called fastlts, both produced, I believe, by Rousseuw's group. The method essentially minimizes the error of fit for a proportion of the data points, with the rest (outliers) ignored. The outliers are found by something like the Minimum Volume Ellipsoid method (roughly, find the ellipsoid of minimum volume containing 1/2 the points).

hth,

Best Answer

Related Solutions

[Math] Linear Regression Confidence Interval

[Math] Linear Regression Coefficients W/ X, Y swapped

Related Question